Frequently Asked Questions 

Language, Applications and Extensions 





Springer 



Verilog: 

Frequently 

Asked 

Questions 




Shivakumar Chonnad 
Needamangalam Balachander 



Verilog: 

Frequently 

Asked 

Questions 



Language, Applications and 
Extensions 



Springer 



eBook ISBN: 0-387-22899-3 

Print ISBN: 0-387-22834-9 



©2004 Springer Science + Business Media, Inc. 



Print ©2004 Springer Science + Business Media, Inc. 
Boston 

All rights reserved 



No part of this eBook may be reproduced or transmitted in any form or by any means, electronic, 
mechanical, recording, or otherwise, without written consent from the Publisher 



Created in the United States of America 



Visit Springer's eBookstore at: 

and the Springer Global Website Online at: 



http://www.ebooks.kluweronline.com 

http://www.springeronline.com 



To our wives, Manjula Chonnad 
and jayanthi Balachander 

To our children, Akshata Chonnad, 

Puja Balachander, and Manya Balachander 




Contents 



Dedication v 

Contributing Authors xvii 

Foreword xix 

Preface xxi 



Acknowledgments xxvii 

1 BASIC VERILOG 1 

1.1 Assignments 1 

1.1.1 What are the differences between continuous and procedural 

assignments? 1 

1.1.2 What are the differences between assignments in initial and 

always constructs? 2 

1.1.3 What are the differences between blocking and nonblocking 

assignments? 3 

1.1.4 How can I model a bi-directional net with assignments 

influencing both source and destination? 4 




Verilog FAQs : Language, Extensions and Applications 



1.2 Tasks and Functions 5 

1.2.1 What are the differences between a task and a function? 5 

1.2.2 Are tasks and functions re-entrant, and how are they different 

from static task and function calls? Illustrate with an 
example 6 

1.2.3 How can I override variables in an automatic task? 9 

1.2.4 What are the restrictions of using automatic tasks? 10 

1.2.5 How can I call a function like a task, that is, not have a return 

value assigned to a variable? 11 

1.2.6 What are the rules governing usage of a Verilog function!,. 12 

1.3 Parameters 13 

1.3.1 How can I override a module’s parameter values during 

instantiation? 13 

1.3.2 What are the rules governing parameter assignments? 17 

1.3.3 How do I prevent selected parameters of a module from 

being overridden during instantiation? 18 

1.3.4 What are the differences between using ‘define, and using 

either parameter or defparam for specifying variables? 19 



1.3.5 What are the pros and cons of specifying the parameters using 
the defparam construct vs. specifying during instantiation? 20 

1.3.6 What is the difference between the specparam and parameter 

constructs? 21 

1.3.7 What are derived parameters? When are derived parameters 

useful, and what are their limitations? 21 

1.4 Ports 22 

1.4.1 What are the different approaches of connecting ports in a 
hierarchical design? What are the pros and cons of each? ...22 

1.4.2 Can there be full or partial no-connects to a multi-bit port of a 

module during its instantiation? 28 

1.4.3 What happens to the logic after synthesis, that is driving an 

unconnected output port that is left open (that is, no- 
connect) during its module instantiation? 29 

1.4.4 What value is sampled by the logic from an input port that is 

left open (that is, no-connect) during its module 
instantiation? 31 

1.4.5 How is the connectivity established in Verilog when 

connecting wires of different widths? 33 

1.4.6 Can I use a Verilog function to define the width of a multi-bit 

port, wire, or reg type? 33 




2 



RTL DESIGN 



35 



2.1 Assignments 35 

2.1.1 What logic is inferred when there are multiple assign 

statements targeting the same wire ? 35 

2.1.2 What do conditional assignments get inferred into? 36 

2.1.3 What is the logic that gets synthesized when conditional 

operators in a single continuous assignment are nested? 36 

2.1.4 What value is inferred when multiple procedural assignments 

made to the same reg variable in an always block? 37 

2.1.5 Why should a nonblocking assignment be used for sequential 

logic, and what would happen if a blocking assignment were 
used? Compare it with the same code in a combinatorial 
block 39 



2.2 Tasks and F unctions 42 

2.2. 1 What does the logic in a function get synthesized into? What 

are the area and timing implications of calling functions in 
RTL? 42 

2.2.2 What are a few important considerations while writing a 

Verilog function ? 44 

2.2.3 What does the logic in a task get synthesized into? Explain 

with an example 48 

2.2.4 What are the differences between using a task, and defining a 

module for implementing reusable logic? 50 

2.2.5 Can tasks and functions be declared external to the scope of 

module-endmodule? 50 

2.3 Storage Elements 51 

2.3.1 Summary of RTL templates for different flip-flops types .... 5 1 

2.3.2 Summary of RTL templates for different Latch types 55 

2.3.3 What are the considerations to be taken choosing between 

flop-flops vs. latches in a design? 59 

2.3.4 Which one is better, asynchronous or synchronous reset for 

the storage elements? 61 

2.3.5 What logic gets synthesized when I use an integer instead of a 

reg variable as a storage element? Is use of integer 
recommended? 62 



2.4 Flow-control Constructs 63 

2.4. 1 How do I choose between a case statement and a multi-way 

if-else statement? 63 

2.4.2 How do I avoid a priority encoder in an if-else tree? 64 




X 



Verilog FAQs : Language, Extensions and Applications 



2.4.3 What are the differences between if-else and the (“?:”) 

conditional operator? 65 

2.4.4 What is the importance of a default clause in a case 

construct? 66 

2.4.5 What is the difference between full_case and parallel_case 

synthesis directive? 66 

2.4.6 What is the difference in implementation with sequential and 

combinatorial processes, when the final else clause in a multi- 
way if-else construct is missing? 67 

2.4.7 What is the difference in using (== or !=) vs. (=== or !==) in 

decision making of a flow control construct in a synthesizable 
code? 69 

2.4.8 Explain the differences and advantages of casex and casez 

over the case statement? 69 

2.5 Finite State Machines 70 

2.5.1 What are the differences between synchronous and 

asynchronous state machines? 71 

2.5.2 Illustrate the differences between Mealy and Moore state 

machines 71 

2.5.3 Illustrate the differences between binary encoding and one- 

hot encoding mechanisms state machines 73 

2.5.4 Explain a reversed case statement, and how it can be useful to 

infer a one-hot state machine? 74 



2.6 Memories 75 

2.6.1 Illustrate how a multi-dimensional array is implemented. ... 75 

2.6.2 What are the considerations in instantiating technology- 

specific memories? 78 

2.6.3 What are the factors that dictate the choice between 

synchronous and asynchronous memories? 79 

2.7 General Design Considerations 80 

2.7.1 What are some reusable coding practices for RTL Design? . 80 

2.7.2 What are “snake” paths, and why should they be avoided? . 81 

2.7.3 What are a few considerations while partitioning large 

designs? 81 

2.8 Multiple clock Design Considerations 82 

2.8.1 How can I reliably convey control information across clock 

domains? 82 

2.8.2 What is a safe strategy to transfer data of different bus-widths 

and across different clock domains? 84 




XI 



2.8.3 What are a few considerations while using FIFOs for posted 
writes or prefetched reads that influence the speed of the 
design? 84 

2.9 Common “Gotchas” in Synthesizable RTL 85 

2.9.1 What will be synthesized of a module with only inputs and no 

outputs? 86 

2.9.2 Why do I see latches in my synthesized logic? 86 

2.9.3 What are “combinatorial timing loops”? Why should they be 

avoided? 86 

2.9.4 How does the sensitivity list of a combinatorial always block 

affect pre- and post- synthesis simulation? Is this still an issue 
lately? 87 

2.10 Coding techniques for Area Minimization 89 

2.10.1 How do the 'ifdef 'ifndef 'elsif, 'endif constructs aid in 

minimizing area? 89 

2.10.2 What is “constant propagation”? How can I use constant 

propagation to minimize area? 90 

2.10.3 What happens to the bits of a reg which are declared, but not 

assigned or used? 92 

2.10.4 How does the generate construct help in optimal area? 93 

2.10.5 What is the difference between using 'ifdef and generate for 

the purpose of area minimization? 96 

2.10.6 Can the generate construct be nested? 97 



2.11 Coding for Better Static Timing Optimization. 97 

2.11.1 What is a critical path in a design? What is the importance of 

understanding the critical path? 97 

2.11.2 How does proper partitioning of design help in achieving 

static timing? 98 

2.11.3 What does it mean to “retime” logic between registers? How 

does it effect functionality? 100 

2.11.4 Why is one-hot encoding preferred for FSMs designed for 

high-speed designs? 100 



2.12 Design for Testability (DFT) considerations 100 

2.12.1 What are the main factors that affect testability of a 

design? 101 

2.12.2 My chip has on-chip tri-state buses. What are the testability 

implications, and how do I take care of it? 101 




xii 



Verilog FAQs : Language, Extensions and Applications 



2. 12.3 Some Flip-Flops in my chip have their resets driven by other 

Flip-Flops within the chip. How will this affect the testability, 
and what’s the workaround? 102 

2.12.4 I have derived clocks in my chip. What are the testability 

implications, and what is the workaround for it? 102 

2.12.5 My chip is power sensitive, and, hence, there are gated clocks 

in it. What are its testability implications and 
workaround? 103 

2.12.6 What is the implication of a combinatorial feedback loops in 

design testability? 103 

2.12.7 How does the presence of latches affect the testability, and 

what’s the workaround? 104 



2.13 Power Reduction considerations 104 

2.13.1 What are the various methods to contain power during RTL 

coding? 104 

2.13.2 Illustrate how the switching of data input to the Flip-Flops 

helps in power reduction 105 

2.13.3 What is the drawback of using the enable flip-flop to reduce 

the power consumption? 106 

2.13.4 Illustrate an example of clock gating to help in reduction of 

power 107 

2.13.5 What are the side effects of latched clock gating logic, and 

how is it fixed? 109 

2. 13.6 What are a few other techniques of power saving that can be 

achieved during the RTL design stage? Ill 

2. 13.7 What are a few system level techniques, apart from RTL, that 

can influence in the reduction of power for the chip? 112 

2.13.8 What are a few power reduction techniques that can be 

achieved through static timing? 113 

2.13.9 What are a few power reduction techniques that can be 

implemented during the backend analysis? 1 13 

2.13.10 What are a few power reduction techniques that can be 

implemented during board design? 1 14 



3 VERIFICATION 115 

3.1 Messaging 115 

3.1.1 What are a few considerations while implementing messaging 

in a model? 116 

3.1.2 What are the different kinds ofmessage severity levels?... 1 17 




xiii 

3.1.3 Illustrate an example of how message levels are implemented 



in a BFM 118 

3.2 Behavioral Functional Models (BFMs) 120 

3.2.1 What is a Bus Functional Model (BFM)? 120 

3.2.2 What are a few considerations that go into designing a 

BFM? 121 

3.2.3 What is a typical flow in designing a BFM? 125 

3.2.4 How can BFMs be used to inject intentional errors in the 

stimulus? 127 

3.3 Bus Monitors 128 

3.3.1 What are the main responsibilities of a bus monitor? 128 

3.3.2 Illustrate with an example, the design of a bus monitor 130 

3.3.3 What other considerations go into designing a Monitor? ... 138 

3.4 Random stimulus generation 138 

3.4.1 Explain with an example, how do I generate random numbers 

inVerilog? 139 

3.4.2 Explain with an example, how do I generate random 

stimulus? 140 

3.4.3 How do I generate constrained random stimulus using 

Verilog? 144 

3.4.4 How can I be sure that the constrained random stimulus has 

covered all the values in the range without repetition in a 
cyclic random fashion? Illustrate this with an example 146 

3.4.5 How can I change the sequence of constrained random 

stimulus? Illustrate this with an example 150 

3.4.6 What is weighted random stimulus? Illustrate this with an 

example 151 

3.4.7 What metrics help in defining the completeness of the random 

simulations? 158 

3.5 Stimulus generation 159 

3.5.1 What are some stimulus generation techniques when the 

stimulus is not reproducible using BFMs? Illustrate these with 
specific examples using Verilog 160 

3.6 Gate level simulations 164 

3.6.1 What is SDF back-annotation, and how is it implemented in 

Verilog testbench? 164 

3.6.2 What are a few pre-requisites before running gate level 

simulations? 165 




xiv Verilog FAQs : Language, Extensions and Applications 

3.6.3 What is the difference between unit delay and full timing 

simulations? 166 

3.6.4 My gate simulation is not passing, and some tests hang. What 

are the key points to look for? 168 



4 MISCELLANEOUS 171 

4.1.1 What is the difference between a vectored and a scalared 

net? 171 

4.1.2 What is the difference between assign-deassign and force- 

releasel 172 

4. 1 .3 What is the order of precedence when both assign-deassign 

and force-release are used on the same variable? 172 

4.1.4 How can I abort execution of a task or a block of code? 173 

4.1.5 What are the differences between the looping constructs 

forever, repeat, while, for, and do-whilel 175 

4.1.6 What is the difference between based and unbased 

numbers? 178 

4.1.7 What does it mean to “short-circuit” the evaluation of an 

expression? 178 

4.1.8 What is the difference between the logical ( = = ) and the case 

( ===) equality operators? 179 



4. 1 .9 What are the differences and similarities between the logical 

(«, ») and the arithmetic (<«, »>) shift operators? 180 

4.1.10 What is the difference between a constant part-select and an 



indexed part-select of a vectored net? 181 

4.1.11 Illustrate how memory indirection is achieved in Verilog. 182 

4.1.12 What is the logic synthesized when a non-constant is used as 

an index in a bit-select? 183 

4.1.13 How are string operands stored as constant numbers in a reg 

variable? 184 

4.1.14 How can I typecast an expression to control its sign? 185 

4.1.15 What are the pros and cons of using hierarchical names to 

refer to Verilog objects? 185 

4.1.16 Does Verilog support an (a b ) operator? 186 

4.1.17 What is the main limitation offork-join in Verilog, and how 

is this overcome in SystemVerilog? 186 

4.1.18 Can I return from a function without having it disabled? ... 188 

4.1.19 What is strobing? How do I selectively strobe a net? 189 

4.1.20 Summarize the main differences between $strobe and 

$monitor. 191 

4. 1 .2 1 How can I selectively enable or disable monitoring? 191 




XV 



4.1.22 How can I specify arguments on the Verilog simulator’s 

command line? 191 

4.1.23 Can the 'define be used for text substitution through 

variable instead of literal substitution only? 193 

5 COMMON MISTAKES 195 

5.1 Some common errors that are not detected at compile-time 195 

5.1.1 What are some ways a race condition can get created, and 

how can these race conditions be avoided? 195 

5.1.2 Illustrate how the infinite loops get created in the looping 

constmcts like forever, while and for 197 

5.1.3 Illustrate the side-effects of specifying a function without a 

range 198 

5.1.4 Illustrate how the errors of passing arguments to a function in 

incorrect order is eliminated in SystemVerilog 199 

5.1.5 Using tri-state logic inside a chip 200 

5.1.6 Illustrate the side effects of not having a final else clause in 

an if-else construct 200 

5.1.7 What is the side effect of not having a default clause in a case 

construct 201 

5.1.8 Illustrate example of how unintentional deadlocked situations 

can happen during simulation 202 

5.1.9 Having a programmed loop that does not move simulation- 
time 203 

5.1.10 Illustrate the side effect of leaving an input port unconnected 

that influences a logic to an output port 204 

5.1.11 Illustrate the side effect of not connecting all the ports during 

instantiation 205 

5.1.12 Illustrate the side effect of forgetting to increase the width of 
state registers as more states get added in a state machine. 207 

5.1.13 Illustrate the side effect of an implicit 1 bit wire declaration of 

a multi-bit port during instantiation 209 

5.1.14 Same variable used in two loops running simultaneously.. 210 

5.1.15 Illustrate the side effects of multiple processes writing to the 

same variable 213 

5.1.16 Illustrate the side effect of specifying delays in 

assignment’s 213 




XVI 



Verilog FAQs : Language, Extensions and Applications 



6 VERILOG DURING SIMULATION REGRESSIONS 215 

6.1.1 Illustrate a few important considerations on simulation 

regressions, and how Verilog can be useful for achieving the 
same 216 

6.1.2 What coding constructs of Verilog can be used during the 

various stages of designing a regression environment for 
simulations? 227 

References 233 

Index 235 




Contributing Authors 



Shivakumar Chonnad is a Staff Engineer at Synopsys Inc. He has been 
working in the industry for over 15 years, covering the various stages of 
ASIC Design & Verification, from specification to hardware validation. 
Shiv currently deals with IP based design and Verification. Shiv has a 
Bachelor’s degree in Electronics and Communications Engineering from the 
Karnatak University, India. Shiv’s areas of professional interest include 
Design and Verification of IPs. 

Needamangalam Balachander is a CAE Manager at Synopsys Inc. He has 
been working in the industry for over 1 5 years, covering the areas of 
system/board-level design & diagnostics, ASIC Design and Verification, and 
currently deals with mixed-signal IP design and support issues. Bala has a 
Bachelor’s degree in Electronics and Communications Engineering from the 
Indian Institute of Science in Bangalore, India. He also holds a B.S degree in 
Physics. Bala’s areas of professional interest include Formal Verification 
methodologies, timing abstractions of mixed-signal IPs, and ATPG issues in 
mixed-signal IPs. 




Foreword 



The Verilog Hardware Description Language was first introduced in 
1984. Over the 20 year history of Verilog, every Verilog engineer has 
developed his own personal “bag of tricks” for coding with Verilog. These 
tricks enable modeling or verifying designs more easily and more accurately. 
Developing this bag of tricks is often based on years of trial and error. 
Through experience, engineers learn that one specific coding style works 
best in some circumstances, while in another situation, a different coding 
style is best. 

As with any high-level language, Verilog often provides engineers 
several ways to accomplish a specific task. Wouldn’t it be wonderful if an 
engineer first learning Verilog could start with another engineer’s bag of 
tricks, without having to go through years of trial and error to decide which 
style is best for which circumstance? That is where this book becomes an 
invaluable resource. The book presents dozens of Verilog tricks of the trade 
on how to best use the Verilog HDL for modeling designs at various level of 
abstraction, and for writing test benches to verify designs. The book not only 
shows the correct ways of using Verilog for different situations, it also 
presents alternate styles, and discusses the pros and cons of these styles. 

When I first received a draft of this book to look over, 1 expected to read 
a book that would only be of interest to the beginning Verilog user. 1 quickly 
discovered that the tricks of the trade presented in this book are not just for 
the novice. Even engineers with many years of experience with Verilog will 
likely find insights on using Verilog, and additional tidbits that they can add 
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to their own bag of tricks. Both novice and experienced Verilog engineers 
will also benefit from the many references in the book on using the newest 
generation ofVerilog, SystemVerilog. 

The authors of this book have done a great job of making it easier for all 
engineers to become masters of Verilog. 



Stuart Sutherland 
Verilog, System Verilog and PLI Consultant 
Sutherland HDL, Inc. 
www.sutherland-hdl.com 




Preface 



Verilog has been a popular Hardware Description Language (HDL) since 
the mid 80’ s. Its popularity has increased with the addition of many new 
enhancements into it. Some key reasons for the adoption of Verilog as the 
language of choice for designers are the simplicity of the language usage and 
the availability of high-performance simulators from multiple EDA vendors, 
which results in reduced execution time for large regression simulations. 

Like any other programming language, experienced users of Verilog are 
fully aware of the language’s capabilities, and have amassed a “bag of 
tricks”, gathered in the course of execution of multiple projects. Beginners to 
the language are often consumed by questions relating to the implications of 
coding styles on synthesis, static timing, power etc. It is important to factor 
in these functional and environmental implications as part of the RTL coding 
stage of the ASIC design process. Not doing so could result in expensive 
iteration cycles. 

This book is for digital designers who use Verilog as the HDL for their 
design and verification. This book will also be useful to those who have 
learned Verilog, and would like to use the various language-constructs, but 
have questions on the capabilities of these constructs. Although the same 
functionality can be implemented by coding in many different styles, some 
of the questions that arise during coding would be: 



Is this the right construct to infer the required logic? 

Is this the best way to implement the required functionality? 
Does this approach help in meeting the design constraint? 




By reading this book, the user is presented with: 

• Multiple coding styles that are appropriate to specific design constraints 
such as area, timing, power, etc. 

• Examples of logic inferred for different constructs or coding styles 

• Illustrations of commonly encountered problems, so that the user can 
incorporate the style or approach that helps eliminate the problem aprior 

• Implications of particular approaches or styles on design constraints. 

We assume that the user has a very basic familiarity with the Verilog 
HDL. Readers who have a basic or intermediate level of expertise in the 
language can also refer to this book to know more implementation details of 
using the HDL in the different contexts of design, verification and 
implications to synthesis, static timing, etc. 

In this book, the authors have delved into many different front end topics 
ofRTL such as synthesis, area, power, testability, etc. Most issues typically 
encountered during these stages have been presented in the form of FAQs. 
Whenever there is more than one approach to meet a requirement, the pros 
and cons of each approach are presented. 

We hope the book will also interest students who are learning Verilog for 
the first time. We believe that this book provides answers to many questions 
that normally pop up as students begin to use the language. 

This book deals only with the front end issues, i.e., until completion of 
functional verification and synthesis with estimated wiring information. The 
book does not discuss any back-end issues like placement, floor-planning, or 
routing. The back-end processes are highly customized to the tools that 
implement them. Wherever appropriate, the implications of the coding style 
that would have an effect on the back-end steps are illustrated. This helps 
avoid expensive iterations in revisiting the golden code, in order to eliminate 
these back-end gotchas. 

This book does not aim to teach the Verilog language for a novice user. 
Instead, we endeavour to address the various issues that typically arise in 
Verilog based chip design projects. Users who wish to learn Verilog from 
scratch may also refer to the Verilog Language Reference Manual (LRM), or 
some of the excellent books already available like “The Verilog Hardware 
Description Language” by Thomas & Moorby, and “SystemVerilog for 
Design” by Suart Sutherland, et al. The details of the syntax and the 
constructs, etc. are not explained within the book, and readers can refer to 
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the LRM for this. In case of any contradiction of the contents in this book 
with the LRM. the content in the LRM is the final authority. 

Throughout the book, we have tried to use simple examples that illustrate 
the point that is being made regarding the capability of the language. In 
certain examples where the illustrated RTL might not have been the most 
optimal way to code, we have deliberately illustrated it sub-optimally, to 
show what functionality or logic gets inferred out of that style of code. These 
simple working examples can be extrapolated and used in larger designs. A 
few times, only a snippet of the full RTL is presented, without the obligatory 
declarations (such as module, endmodule, input, output ) etc. These are 
assumed predefined by the users. Wherever appropriate, we have also 
included simplified schematics of the outcome of the synthesized results. 

We have verified every RTL example with a simulator and a synthesis 
tool. In order to illustrate some of the capabilities or the limitations in the 
language, we have coded some RTL examples in particular styles, or using 
particular constructs. For the most part however, we have coded RTL 
examples in the most timing and area optimal approach. 

Although this book does not provide the answers to all the possible 
questions that can arise, we hope it will address the most commonly 
encountered problems. We believe that this book will help readers make 
more informed choices between approaches in achieving functionality and 
constraints in their VLSI projects. Based on the feedbacks we receive, and 
more findings of interesting issues, we hope to keep this as an ongoing 
activity of incorporating more FAQs and their answers in the future editions. 

This book is unique, because it addresses complex language issues, along 
with guidelines to address the coding, timing and synthesis issues, reliability 
of designs, and verification in the form of FAQs. It captures many scenarios 
and issues that have been encountered while dealing with complex pieces of 
IP during various stages of the project cycle. It also addresses the three 
versions of Verilog that current users must contend with: 

Verilog ‘95 

Verilog 2001 

SystemVerilog 3.1a 

Wherever applicable, we have also compared the coding semantics 
between the different Verilog versions from Verilog-95 to SystemVerilog. 
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The general organization of these topics have been categorized into different 
chapters as follows: 

Chapter 1 : Basic Verilog discusses a few important constructs of 
Verilog and comparisons of what their implications mean in a Verilog based 
environment. 

Chapter 2 : RTL Design discusses the various RTL design and 
synthesis related FAQs. This chapter will be of real interest to the RTL 
designers as it discusses the comparison of different coding constructs and 
styles. The chapter also discusses issues seen during design for area, timing, 
testability and power. 

Chapter 3 : Verification emphasizes using Verilog constructs for 
Verification. The various issues and considerations for design of Bus 
Functional Model’s and Bus Monitors are discussed in this chapter. This 
chapter will be of special interest to readers with verification responsibilities. 
It also discusses the various mechanisms of random stimulus generation and 
examples of the different mechanisms. 

Chapter 4 : Miscellaneous has all the FAQs that do not explicitly fall in 
any of the above chapters of RTL and Verification. It discusses the subtle 
and interesting scenarios of using Verilog at a system level. 

Chapter 5 : Common Mistakes illustrates most of the commonly made 
mistakes in the use of Verilog for design or verification. The chapter 
discusses how the functional issues go undetected, even though it goes 
through the compile stage without any errors. Any workaround’s to prevent 
or detect these mistakes have also been illustrated appropriately. 

Chapter 6 : Verilog during Simulation Regressions illustrates the 
different requirements seen during simulation regression, and how different 
constructs of Verilog can be incorporated within the testbench that will help 
during regressions. 

Verilog is a registered trademark of Cadence Design Systems. Since the 
above chapters have been categorized to address the different topics like 
design and verification separately, some readers may find it suitable to 
directly begin with these chapters. The authors, however, recommend 
reading from Chapter 1 onwards until the end, to understand different issues 
presented through out the design cycle. 
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Also, the Table of Contents consists directly of the FAQs themselves. 
Therefore, by simply browsing through the Table of Contents, readers can 
determine if their particular questions or topics have been dealt with in the 
book. 
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Chapter 1 

BASIC VERILOG 



INTRODUCTION 

This chapter addresses frequently asked questions on the basics of the 
Verilog hardware description language. This chapter deals with FAQs on 
Verilog assignments, tasks, functions, parameters, and ports. These 
constructs form a large section of the Verilog code and interconnection in 
designs. 

1.1 Assignments 

The following section discusses the different kinds of assignments that 
are possible in Verilog, and what their features are. 

1.1.1 What are the differences between continuous and procedural 
assignments? 

The following table captures the differences between continuous and 
procedural assignments: 



Table 1-1. Differences between continuous and procedural assignments 



Continuous assignment 


Procedural assignment 


Assigns values primarily to nets 


Assigns values primarily to reg 
variables 


Variables and nets continuously 
drive values onto ports 


Results of calculations involving 
variables and nets can be stored into 
variables 
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Continuous assignment 


Procedural assignment 


Used to infer combinatorial logic 


Used to infer both storage elements 
like Flip-flops and latches and also 
combinatorial logic 


Assignment occurs whenever the 
value on the RHS of the expression 
changes as a continuous process 


The value of the previous assignment 
is held until another assignment is 
made to the variable 


Occurs in assignments to wire, port, 
and net type 


Occurs in constructs like always, 
initial, task, function 


For example, 

wire outl = ini & in2 ; 
or 

assign outl = ini & in2; 


For example, 

always ©(posedge elk) 
regl <= ini; 
always @(a or b or s) 
y = (s == 1) ? a : b; 



1.1.2 What are the differences between assignments in initial and 
always constructs? 

While both initial and always constructs are procedural assignments, 
they differ in the following ways: 



Table 1-2. Differences between initial and always blocks 



initial 


always 


Assignments in an initial block begin 
to execute from time 0 in simulation, 
and proceed in the specified 
sequence. 


Assignments in an always block also 
begin from time 0, and repeat forever 
as a function of the changes on the 
blocks sensitivity list 


Execution of statements in an initial 
begin-end block stops when the end 
of the block is reached, i.e., executed 
only once during simulation 


Execution continuously repeats from 
the begin to the end of the process 
unless held by a wait construct 
throughout the simulation session 


Non-synthesizable construct 


Synthesizable construct 


For example, 

reg [1:0] outl, out2; 
initial begin 
outl = 2'blO; 

#5 out 2 = 2'b01; 
end 


For example, 

reg [1:0] outl, out2; 
always ©(posedge elk) 
begin 

outl <= ini; 
out2 <= outl & in2; 
end 
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1.1.3 What are the differences between blocking and nonblocking 
assignments? 

While both blocking and nonblocking assignments are procedural 
assignments, they differ in behaviour with respect to simulation and logic 
synthesis as follows: 



Table 1-3. Differences between blocking and nonblocking assignments 



Blocking assignments 


Nonblocking assignments 


In a blocking assignment, the 
evaluation of the expression on the 
RHS is updated to the LHS variable 
autonomously based on the delay 
value (either 0 if no delay specified, 
or scheduled as a future event if a 
non-0 value is specified) 


Nonblocking assignment to LHS is 
scheduled to occur when the next 
evaluation cycle occurs in simulation 
and not immediately. Updates are 
not available immediately within the 
same time unit 


When multiple blocking assignments 
are present in a process, the trailing 
assignments are blocked from 
occurring until the current 
assignment is completed 


Multiple nonblocking assignments 
can be scheduled to occur 
concurrently on the next evaluation 
cycle in simulation 


There is a possibility of race 
conditions on the variables of 
blocking assignments if assignments 
happen to it from two processes 
concurrently 


The race conditions are avoided as 
the updated value is assigned after 
evaluation 


Recommended to use within 
combinatorial always blocks 


Recommended to use within the 
sequential always blocks 


Can be used in procedural 
assignments like initial, always and 
continuous assignments to nets like 
assign statements 


Can be used only in the procedural 
blocks like initial and always ; 
Continuous assignment to nets like 
the assign statement is not permitted 


Represented by “=” operator sign 
between LHS and RHS 


Represented by “<=” operator sign 
between LHS and RHS 
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Blocking assignments 


Nonblocking assignments 


For example, 

initial begin 

regl = #10 2'blO; 
reg2 = #5 2'b01; 
end 

Starting from time 0, regl will be 
assigned 2’blO at time 10 units and 
reg2 assigned 2’b01 at time 15 
unit. Assignment to reg2 happens 
after the assignment of regl 


For example, 

initial begin 

regl <= #10 2'blO; 
reg2 <= #5 2'b01; 
end 

Starting from time 0, reg2 will be 
assigned 2’b01 at time 5 units and 
regl will be assigned 2’blO at time 
10 unit. Assignment to reg2 
happens earlier than regl 



1.1.4 How can I model a bi-directional net with assignments 
influencing both source and destination? 

The assign statement constitutes a continuous assignment. The changes 
on the RHS of the statement immediately reflect on the LHS net. However, 
any changes on the LHS don't get reflected on the RHS. For example, in the 
following statement, changes to the rhs net will update the lhs net, but not 
vice versa. 

wire rhs, lhs; 
assign lhs = rhs; 

System Verilog has introduced a keyword alias, which can be used only 
on nets to have a two-way assignment. For example, in the following code, 
any changes to the rhs is reflected to the lhs, and vice versa. 

module test_alias; 

wire [3:0] lhs, rhs; 

alias lhs = rhs; // two way assignment 

initial begin 

force rhs = 4 ' h2 ; 

$display ("lhs = %0h, rhs = %0h" , lhs, rhs); 
release rhs; 



force lhs = 4 'he; 







Basic Verilog 



5 



$display ("lhs = %0h, rhs = %0h" , lhs, rhs); 
release lhs; 
end 

endmodule // test_alias 

Had the above alias command been assign, the outputs of the above 
display outputs would be as follows: 

lhs = 2, rhs = 2 
lhs = c, rhs = z 

However, with the alias command as it is, the outputs are as follows: 

lhs = 2, rhs = 2 
lhs = c, rhs = c 

In the above example, any change to either side of the net gets reflected 
on the other side. 

1.2 Tasks and Functions 

This section discusses the different FAQs on task and function in 
Verilog. The section also discusses a few advancements on these constructs 
in System Verilog. 

1.2.1 What are the differences between a task and a function ? 

Both tasks and functions in Verilog help in executing common 
procedures from different places in a module. They help in writing cleaner 
and maintainable code, by avoiding replication at different places in a 
module. Essentially, functions and tasks provide a “subroutine” mechanism 
of reusing the same section of code at different places in a module. This 
allows for easier maintenance of the code. 

However, the tasks and functions differ in the following aspects: 



Table 1-4. Differences between tasks and functions 



task 


function 


Can contain time control statements 
like (cbfposedge .), delay operator (#) 


Executes in zero simulation time 
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task 


function 


Can call any number of function’s or 
tasks within itself 


Can call any number of function's 
within itself 


Cannot return any value when called; 
instead the task can have output 
arguments 


Returns a single value when called. 
In Sy tern Verilog the return value can 
be optionally voided 


For example, gt result is an 
output of a task to calculate the 
result of the greater of two input 
arguments argl and arg2 

greater_val (argl, arg2, 
gt result); 


For example, gt result is 
assigned the return of a function call 
to calculate the result of the greater 
of two input arguments argl and 
arg2 

gt_result = 

greater val(argl, arg2) 



1.2.2 Are tasks and functions re-entrant, and how are they different 
from static task and function calls? Illustrate with an 
example. 

In Verilog-95, tasks and functions were not re-entrant. From Verilog 
version 2001 onwards, the tasks and functions are reentrant. The reentrant 
tasks have a keyword automatic between the keyword task and the name of 
the task. The presence of the keyword automatic replicates and allocates the 
variables within a task dynamically for each task entry during concurrent 
task calls, i.e., the values don’t get overwritten for each task call. Without 
the keyword, the variables are allocated statically, which means these 
variables are shared across different task calls, and can hence get overwritten 
by each task call. 

The following example illustrates the effect of the keyword automatic 
for re-entrant tasks. This is a non-synthesizable code for the purpose of 
illustration only. 

module modify_taskval ; 



integer out_val ; 

task automatic modify_value ; 
input [1:0] in_value; 
output [3:0] out_value; 
reg [1:0] my_value; 
begin 

// syntax error to use nonblocking assignment with 
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// automatic variables 

my_value = in_value; // blocking assignment 
#5 

$display ( "my_value = \t%0d, t = %0d" , 
my_value, $time) ; 
out_value = my_value + 2 ; 
end 

endtask 

initial begin 
fork 

begin // First parallel call 
#1 

$display ( " inl= \t\t%0d, t = %0d",2, $time ) ; 
modify_value (2 , out_val) ; 
end 

begin // Second parallel call 
#2 

$display ( "in2 = \t\t%0d, t = %0d",3, $time) ; 
modify_value (3 ( out_val); 
end 
join 
end 

endmodule 

In the above example, my_value is a local variable in the task 
modify_value. Whenever this task is called, the input in_value is 
assigned to the local variable after 5 simulation timeunits. Within the initial- 
begin , there is a fork-join, which launches two parallel processes. One starts 
after simulation timeunit #1, and other after #2. The first process assigns a 
value of 2 to the output of the task, and the second one assigns a value of 3 
to the output of the task. Running the simulation with the above code, but 
without the automatic keyword, provides the following display: 



ini = 
in2 = 

my_value = 
my_value = 



2, t = 1 // passed value is 2 

3, t = 2 

3, t = 6 // retained value is 3 
3, t = 7 



The sequence of events without the keyword automatic is as follows: 
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1. The launch of the two processes from the fork-join happens from time 0. 

2. The first process calls modify_value after #1, and the local variable 
my_value is assigned the value 2. This happens at t=l. 

3. The second process calls modif y_value after #2 and the local variable 
my_value is assigned the value 3. This happens at t=2. Note that the 
earlier value of 2 assigned to the local variable my_value is now 
overwritten with the value 3. 

4. After 4 more time units i.e., at t = 1+5=6, the display of the first task call 
becomes active. Since the latest value is now “3”, based on the previous 
step, the value of “3” is displayed for my_value, instead of what was 
passed as “2”. 

5. Similarly, for the second process i.e., 2+5=7, the display of the second 
task call becomes active. Since the latest value is still “3”, the value of 
“3” is displayed for my_value here too. 

The critical replacement happened in step 3 above, wherein the launch of 
the 2 nd process actually overwrote the value of the first process before its 
turn to display. This occurred because without the automatic keyword, the 
variables within the task were static, and shared by all calls to the task. 

Now, with the keyword automatic between the task and task name, the 
following is the output: 

ini = 2, t = 1 //passed value is 2 

in2 = 3, t = 2 

my_value = 2, t = 6 //passed value 2 preserved 

my_value = 3, t = 7 

Following the same steps as above, this time, due to the presence of the 
keyword automatic, the unique values of the variables are preserved in each 
call, and not overwritten by the subsequent task calls before the variable is 
being used. 

The same explanation holds true for recursive function calls where a 
function calls itsef, with the placement of keyword automatic between 
function and the function name. 

Note that the keyword automatic has influence only within the current 
hierarchy of the concurrent task calls. The same task called within separate 
module hierarchy doesn’t overlap, and hence the need for automatic 
construct doesn’t exist for that scenario. 
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The following table summarizes the differences between a reentrant task 
from a static task call: 



Table 1-5. Differences between reentrant and static tasks 



Reentrant task 


Static task 


Has the keyword automatic between 
the task keyword and identifier 


Doesn ’t have the keyword automatic 
between the task keyword and the 
identifier 


Variables declared within the task 
are allocated dynamically for each 
concurrent task call 


Variable declarations within the task 
are allocated statically 


All variables will be replicated in 
each concurrent call to store state 
specific to that invocation 


Each concurrent call to the task will 
OVERWRITE the statically 
allocated local variables of the task 
from all other concurrent calls to the 
task 


Variables declared are de-allocated 
at the end of task invocation 


Variables retain their values between 
invocations 


Task items cannot be accessed by 
hierarchical inferences 


Task items can be accessed by 
hierarchical inferences 


Task items shall be allocated new 
across all uses of the task executing 
concurrently 


Task items can be shared across all 
uses of the task executing 
concurrently 



1.2.3 How can I override variables in an automatic task? 

By default, all variables in a module are static, i.e., these variables will 
be replicated for all instances of a module. However, in the case of task and 
function, either the task/function itself or the variables within them can be 
defined as static or automatic. The following explains the inferences 
through different combinations of the task/function and/or its variables, 
declared either as static or automatic. 

1. No automatic definition of task/function or its variables 

This is the Verilog- 1995 format, wherein the task/function and its 
variables were implicitly static. The variables are allocated only once. 
Without the mention of the automatic keyword, multiple calls to 
task/function will override their variables. 
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2. static task/function definition 

System Verilog introduced the keyword static. When a task/function is 
explicitly defined as static, then its variables are allocated only once, and can 
be overridden. This scenario is exactly the same scenario as before. 

3. automatic task/function definition 

From Verilog-2001 onwards, and included within SystemVerilog, when 
the task/function is declared as automatic, its variables are also implicitly 
automatic. Hence, during multiple calls of the task/function, the variables 
are allocated each time and replicated without any overwrites. 

4. static task/function and automatic variables 

SystemVerilog also allows the use of automatic variables in a static 
task/function. Those without any changes to automatic variables will 
remain implicitly static. This will be useful in scenarios wherein the implicit 
static variables need to be initialised before the task call, and the automatic 
variables can be allocated each time. 

5. automatic task/function and static variables 

SystemVerilog also allows the use of static variables in an automatic 
task/function. Those without any changes to static variables will remain 
implicitly automatic. This will be useful in scenarios wherein the static 
variables need to be updated for each call, whereas the rest can be allocated 
each time. 

1.2.4 What are the restrictions of using automatic tasks? 

The following are the restrictions of using automatic tasks: 

• Only blocking assignments can be used on automatic variables. Refer to 
the earlier FAQ 1.2.2 for an example on this. 

• The variables in an automatic task shall not be referenced by procedural 
continuous assignments or procedural force statements. In the following 
code, the variable my_value in the task cannot be referenced by an 
assign statement. 
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task automatic modify_value ; 
input [1:0] in__value; 
reg [1:0] my_va 1 ue ; 
begin 

my_value = in_value; 
end 

endtask 
initial begin 

force modify_value . my_value =1; // not allowed 
$monitor (modify_value .my_value) ; //not allowed 
end 

• They shall not be traced by system calls like $monitor and $dumpvars 
as illustrated in the above example. 

1.2.5 How can I call a function like a task, that is, not have a 
return value assigned to a variable? 

Until Verilog 2001, my function call must return a value to the type reg, 
integer, real, time or realtime and the code calling the function must receive 
the return value. For example, the following is a syntax error: 

function my_funct; 



endf unction 



initial begin 

my_funct { . . ) ; // MUST have a destination 
end 

The line in the above example is a syntax error, since the call of 
my_f unct does not have a destination. Only a task can be called without a 
destination value. 

SystemVerilog has introduced a construct void to facilitate a voided 
function call, that is, there is no destination for the function call. This would 
make a function call similar to a task call. With System Verilog, functions 
can also have output and inout arguments. The following example illustrates 
avoided function call: 

module £unc_lbit; 

reg [31:0] int_result; // Global variable 
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function void my_func; 
input [31:0] ini; 
input [31:0] in2 ; 
output [31:0] Out 1 ; 

// no need to assign the function 
/ / my_f unc = ini + in2 ; 

int_result = ini + in2 ; 
endf unction 

initial begin 

my_func (3 , 4 , int_result) ; //no destination required 
$display ( " int_result = %0d" , int_result) ; 
end 

endmodule 

The above example displays the result of int_result = 7. Some key 
observations in the above example are: 

• The assignment to the function my_func was not required, since its 
return value is void. 

• The 32 bit return range between the keyword function and my_func 
was also not required, since it is now a void return. 

• The call of the function my_f unc within the initial-begin-end does not 
require a destination, since the return has been voided. 

• Some other intermediate variable like int_result declared in the 
above example at the scope of that module can still be modified within 
the voided function. 

• SystemVerilog also allows functions with a return to be called as a task 
by casting the function call to void. For example: 

initial 

void (my_func (...) ) ; 

1.2.6 What are the rules governing usage of a Verilog function 1 ! 

The following rules govern the usage of a Verilog function construct: 

• A function cannot advance simulation-time, using constructs like #, @. 
etc. 

• A function shall not have nonblocking assignments. 

• A function without a range defaults to a one bit reg for the return value. 
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• It is illegal to declare another object with the same name as the function 
in the scope where the function is declared. 

1.3 Parameters 

The following section discusses a few questions about the usage of 
parameters, pros and cons of the different approaches and what’s new in 
System Verilog regarding parameters. 

1.3.1 How can I override a module’s parameter values during 
instantiation? 

If a Verilog module uses parameters, there are two ways to override its 
values. Note that only parameters can be overridden. The localparam and 
specparam parameters cannot be overridden. 

1.3. 1.1 During instantiation 

In this method, the new values are assigned inline during module 
instantiation. There are two ways to override during instantiation. 

1.3.1. 1.1 Assignment by ordered list 

In this method, the order in which the parameters are assigned follow the 
order in which they are declared within the module. For example, the 
module parameter_list contains two parameters, that is, width and 
depth, that have been assigned default values within the module. It is 
instantiated in the following module, example_parameter_list, with 
examples of these parameters overridden with different values in different 
instantiations. 

module parameter_list (addr, data); //1995 format 

parameter width = 32; 

parameter depth = 64 ; 

parameter num_buses = 44; 

input [width- 1 : 0] addr; 

input [depth- 1 : 0] data; 

endmodule 
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The same example above can be represented in the Verilog 2001 in the 
following format, in which the parameter declarations between the module 
and input/output declaration are now declared before the module port list. 

module parameter_list 

# (parameter width = 32, // 2001 format 
parameter depth = 64, 
parameter num_buses = 4) 

(addr, data) ; 

input [width-1 : 0] addr; 
input [depth- 1 : 0] data; 

endmodule 

module example_ordered_list ; 

reg [127 : 0] a; 

reg [255 : 0] b; 

reg [ 63 : 0] c; 

reg [31 : 0] d; 

// Instantiating parameter_list module and 
// overriding width only 
parameter_list #(128) UO (a, c) ; 

// Instantiating parameter_list module and 
/ / overriding width and depth only 
parameter_list #(128, 256) U1 (a, b) ; 

// Instantiating parameter_list module and 
// overriding num_buses only 
parameter_list #(32, 256, 8) U2 (d, b) ; 

endmodule 

The restriction of using the above method is: 

• The parameter override values have to be contiguous, that is, any 
parameter cannot be skipped during override. For example, in the above 
code with U2 instantiation, {he parameter width and depth cannot be 
skipped while trying to override width and numjouses only. 
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Two methods to overcome this restriction are: 

• Precede the order of declaring the parameters within the module with the 
ones that will change, placing the subset that doesn’t change later in the 
order. For example, in the above code with UO and U1 instantiations, the 
num_buses was not required to be changed, and was last in the 
priority. The default value of 4, assigned to it within the module, will 
hold true in these two instantiations. 

• Assign values to ALL the parameters, including the ones that don’t need 
to be changed. In instantiation U2, although only the num_buses 
parameter needed to be changed, but the width and depth 
parameter' s still required to be assigned with the same default value as 
in the module definition. 

1.3.1. 1.2 Assignment by name 

This is a new feature, available from Verilog-2001 onwards. This is a 
better approach of overriding the module parameter by which the parameters 
are overridden by explicitly specifying the parameter name and its 
overriding value. This way, the parameter value is linked to its name, and 
not position of declaration. 

Using the same module parameter_list as defined above, the 
following example shows the same parameter overriding, this time 
specifying by name. 

module examp 1 e_by_name ; 
reg [127 : 0] a; 

reg [255 : 0] b; 

reg [63 : 0] c; 

reg [31 : 0] d; 

// Instantiating parameter_list module and 

// overriding width only 

parameter_list #( .width (128) ) UO (a, c) ; 

// Instantiating parameter_list module and 
// overriding width and depth 
parameter_list #( .width (128) , .depth (256)) 

U1 (a, b) ; 
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// Instantiating parameter_list module and 

// overriding depth only 

parameter_list #( .depth (256) ) U2 (d, b) ; 

endmodule 

Note that explicit parameter names were followed by their overriding 
values in the parenthesis. In the case of U2, just specifying the depth was 
sufficient, without having to specify anything for width parameter. 

1.3.1.2 Using defparam 

In this method, the parameter within a module is accessed by its 
hierarchical name from anywhere within the scope of the hierarchy. In the 
following example, the lower level module parameter_list gets 
instantiated in the example_defparam module. But the values of width 
and depth are overridden using the defparam construct. 

module example_def param; 
reg [127 : 0] a; 

reg [255 : 0] b; 

reg [63 : 0] c ; 

reg [31 : 0] d; 

// Instantiating parameter_list module and 
// overriding width only 
parameter_list U0 (a, c) ; 
defparam U0 . width = 128; 

// Instantiating parameter_list module and 
/ / overriding width and depth 
parameter_list U1 (a, b) ; 
defparam Ul. width = 128; 
defparam Ul . depth = 256; 

// Instantiating parameter_list module and 
/ / overriding depth only 
parameter_list U2 (d, b) ; 
defparam U2. depth = 256; 



endmodule 
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The following bullet items summarize the advantages of using the 

defparam approach: 

• The ordered sequence need not be maintained in overriding the 
parameter values. 

• A specific parameter can be overridden rather than re-specifying all the 
parameters prior to the one that’s being overridden. 

• Can help with code maintenance by grouping all the defparam' s 
collectively in a single place, which can be compiled with the rest of the 
code. 

Parameter redefinition at instantiation is the recommended style by most 

expert Verilog users. There are several reasons to avoid using defparam for 

parameter redefinition. Some of the reasons are: 

1. The defparam statements if not collectively present in one place, 
can be buried in any module, anywhere in the design hierarchy, 
making code difficult to maintain or reuse (a form of spaghetti 
code, which should always be avoided). 

2. Since the defparam statements can be buried anywhere in the 
hierarchy, they can prevent the Verilog language compilers from 
being able to do true independent compilation of the modules. 

3. Since multiple defparam statements can be made to the same 
parameter instance, the final value of the parameter in this 
situation can (and probably will be) different with different tools. 

4. The defparam statements are not supported in the official IEEE 
1364.1-2002 synthesis subset for Verilog 

5. The IEEE 1364 standards committee is considering a proposal to 
deprecate defparam in the next version of the Verilog standard, 
making the defparam an obsolete construct. 

1.3.2 What are the rules governing parameter assignments? 

The rules governing the parameter assignments are as follows: 

• The parameter override at instantiation can be done either by specifying 
an ordered list or by name, but not a mix of both. For example, the 
following is an incorrect way of specifying both width and depth. 

parameter_list (128, .depth(256)) U_wrong (a,b) ; 
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• While assigning the parameter during instantiation, once a parameter 
has been assigned a value, there cannot be another assignment to the 
same parameter. For example, specifying the width parameter twice 
within the same instantiation is illegal. 

parameter width = 64; 
parameter width = 12 8; 

// Specifying the same parameter more than once 
// is an Error 

• If a parameter is assigned both by a defparam and in the module’s 
instantiation, the defparam' s assignment takes precedence. In the 
following example, the width parameter is instantiated with value 128, 
but a defparam to the same parameter with the value 64 also follows it, 
then the defparam gets precedence, and width will finally have the 
value 64. 

parameter_list #(128) U1 (a, b) ; 

defparam Ul. width = 64; // This statement "wins" 

1.3.3 How do I prevent selected parameters of a module from being 

overridden during instantiation? 

If a particular parameter within a module should be prevented from 
being overridden, then it should be declared using the localparam construct, 
rather than the parameter construct. The localparam construct has been 
introduced from Verilog-2001. Note that a localparam variable is fully 
identical to being defined as a parameter , too. In the following example, the 
localparam construct is used to specify num_bits, and hence trying to 
override it directly gives an error message. 

module localparam_list (addr, data) ; 

parameter width = 32; 

parameter depth = 64; 

localparam num_bits = width * depth; 

input [width- 1 : 0] addr; 

input [depth-1 : 0] data; 

endmodule 
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Note, however, that, since the width and depth are specified using the 
parameter construct, they can be overridden during instantiation or using 
defparam, and hence will indirectly override the num bits values. 

In general, localparam constructs are useful in defining new and 
localized identifiers whose values are derived from regular parameters. 

1.3.4 What are the differences between using 'define, and using 
either parameter or defparam for specifying variables? 

Both define and parameter constructs can be used to specify constants 
in the design. For example, the width parameter can be specified either as 
a define or parameter, as: 

'define width 64 

if ( 'width == 64 ) ... 

or 

parameter width 64; 
if (width == 64) . . . 

However, the following are a few differences in using the two constructs: 



Table 1-6. Differences between 'define and parameter/defparam 



define 


parameter/defparam 


'define is basically a text 
substitution macro 


Parameter is used to specify 
constants in a design 


Multiple 'defines to the same 
variable name are not allowed, the 
final value of the macro is 
determined by source code order 


Although multiple parameter 
definitions to the same variable are 
not allowed within a module, 
multiple defparam' s to the same 
variable are allowed, however the 
final value of the parameter is 
indeterminate 


Cannot be overridden in any 
mechanism 


Parameter can be overridden 


Only one constant with the given 
name can exist in the full scope 


Multiple modules can have the same 
parameter name, as it is limited to 
that scope only 
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1.3.5 What are the pros and cons of specifying the parameters 
using the defparam construct vs. specifying during 
instantiation? 

The advantages of specifying parameters during instantiation method are: 

• All the values to all the parameters don’t need to be specified. Only 
those parameters that are assigned the new values need to be specified. 
The unspecified parameters will retain their default values specified 
within its module definition. 

• The order of specifying the parameter is not relevant anymore, since the 
parameters are directly specified and linked by their name. 

The disadvantage of specifying parameter during instantiation are: 

• This has a lower precedence when compared to assigning using 
defparam. 

The advantages of specifying parameter assignments using defparam 
are: 

• This method always has precedence over specifying parameters during 
instantiation. 

• All the parameter value override assignments can be grouped inside one 
module and together in one place, typically in the top-level testbench 
itself. 

• When multiple defparams for a single parameter are specified, the 
parameter takes the value of the last defparam statement encountered in 
the source if, and only if, the multiple defparam' s are in the same file. If 
there are defparam' s in different files that override the same parameter, 
the final value of the parameter is indeterminate. 

The disadvantages of specifying parameter assignments using defparam 
are: 

• The parameter is typically specified by the scope of the hierarchies 
underneath which it exists. If a particular module gets ungrouped in its 
hierarchy, [sometimes necessary during synthesis], then the scope to 
specify the parameter is lost, and is unspecified. 
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For example, if a module is instantiated in a simulation testbench, and its 
internal parameters are then overridden using hierarchical defparam 
constructs (For example, defparam U1 . U_f if o . width = 32;). 
Later, when this module is synthesized, the internal hierarchy within U1 
may no longer exist in the gate-level netlist, depending upon the 
synthesis strategy chosen. Therefore post-synthesis simulation will fail 
on the hierarchical defparam override. 

See the earlier FAQ 1.3. 1.2 for additional disadvantages of defparam and 
why this construct should not be used. 

1.3.6 What is the difference between the specparam and parameter 
constructs? 

The specparam is a special kind of parameter that is intended to specify 
only timing and the delay values. The key differences in using the 
specparam and the parameter constructs are: 



Table 1-7. Differences between specparam and parameter 



specparam 


parameter 


Can be defined within both module 
and specify block 


Must be defined outside the specify 
block and within module 


A specparam can be assigned using 
another specparam or parameter or 
a combination of both 


Parameter cannot be assigned the 
value of a specparam 


Value is overridden using SDF 
annotation 


Can be overridden during 
instantiation or using defparam 



1.3.7 What are derived parameters? When are derived parameters 
useful, and what are their limitations? 

When one or more parameters are used to define another parameter, then 
the result is a derived parameter. The derived parameter can be either of the 
type parameter or localparam. In the following example, two parameters, 
width and depth, can be used to define a third parameter, num_bits. In 
this case, the num_bits takes a value of 32. 

module derived_param; 
parameter width = 4 ; 
parameter depth = 8; 

// num_bits is a derived parameter 
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localparam num_bits = width * depth; 
endmodule 

The advantages of using derived parameters are: 

• Makes the RTL code reusable 

• Enables use of the shorter name of num_bits instead of completely 
specifying (width * depth) 

The consequence of using derived parameters is that derived parameters 
can be indirectly overridden by overriding their dependent parameters 
through defparam constructs. So, localparam constructs should be used with 
care when defining derived parameters. 

1.4 Ports 

The following section discusses a few questions about the usage of ports, 
pros and cons of the different approaches of port connections, and what’ s 
new in System Verilog regarding ports. 

1.4.1 What are the different approaches of connecting ports in a 
hierarchical design? What are the pros and cons of each? 

While instantiating the sub-modules in a given hierarchy, the port 
connections to those modules can be done in one of five ways: 

1.4.1. 1 Ordered port connection 

In this method, the port expressions listed for module instance shall be in 
the same order as the ports listed in the module declaration, that is, the first 
element in the list is connected to the first port declared, the second element 
to the second port and so on. For example, in the code below, the upper 
module instantiates a lower module, and the ports are implicitly connected, 
that is, the connection is based on order and position. 

module lower (addr, data) ; 
input [width- 1 : 0] addr; 

inout [depth- 1 : 0] data; 

endmodule // lower 



module upper (ini, outl) ; 
input [width- 1 : 0] ini; 
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output [depth- 1 : 0] outl; 

lower U0 (ini, outl) ; // implicit connection of 
// ini to addr and outl to data ports 

endmodule / / upper 

1.4.1.2 Named port connection 

In this method, the connection between the ports can be done explicitly 
by linking the two names for each side of the connection, that is, the port 
declaration name from the module declaration can be linked to the name 
used in the instantiating module. The same example as above would be 
connected using the named port connection as follows. Note that the order of 
port connection is changed. However, it is recommended to keep the same 
order for reusability and readability. 

lower U1 ( 

.data(outl), // Order is changed, 

// connection is by name 
.addr (ini) // only and not position 

) ; 



The two main advantages of this method are: 

• It improves readability of the connections without having to refer to the 
port list of the instantiated module as the names from both sides are 
explicitly specified. 

• The order of port connections is not relevant anymore since they are 
explicitly connected. 

Note that the two types of module port connections cannot be mixed,, 
that is, all the connections to the ports of a particular module instance shall 
be either by order or by name. For example, the following is incorrect: 

/ / gives a syntax error 

lower U_wrong (ini, . addr (outl) ) ; 
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1.4.1.3 Implicit .* port connection 

This is a feature available from System Verilog only. A new construct of 
specifying during module instantiation implicitly connects the ports of 
the instantiated module with the wires in the instantiating module. The 
precondition being the fact that the names and sizes need to be matched 
exactly. For example, in the following code, the upper module instantiates 
two lower modules. U1 and U2 The is equivalent to specifying three 
connections of ini, in2 , and in3 between the lower and upper 
modules. 

module lower (ini, in2 , in3 , outl, out2); 

input [7:0] ini, in2 , in3 ; 
output [7:0] outl, Out 2 ; 

assign outl = ini & in2; 
assign out2 = ini | in3 ; 

endmodule // lower 

module upper (ini, in2 , in3 , u_outll, u_outl2, 
u_Out21, u_out22) ; 
input [7:0] ini, in2, in3 ; 
output [7:0] u_outll, u_outl2; 
output [ 7 : 0 ] u_OUt21, U_OUt22; 

wire [7:0] u_outll, u_outl2; 
wire [7:0] u_out21, u_out22; 

// Instantiating lower 
lower U1 ( 

.*, // .* does .inl(inl), ,in2(in2), . in3(in3) 

. outl (u_outll ) , 

.out 2 (u_outl2) 

) ; 

// Instantiating lower 
lower U2 ( 

// .* does .inl(inl), .in2(in2), .in3(in3) 

.outl (u_out21) , 

. out2 (u out22) 
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) ; 

endmodule // upper 

The synthesized logic of the above code instantiates the two lower 
modules, U1 and U2. The correct port connections are also established for 
the ports ini, in2, and in3. 

The advantage of the above method is that there is less chance of errors 
during instantiation, and it avoids repetition of names that implicitly match. 
Wherever exceptions and deviations exist, it needs to be explicitly specified. 
In the above, the connection to u_outll, u_outl2, u_out21, and 
u_out22 were made explicit. 

The issue in the above method is that the user will not be able to 
physically “see” the connections. 

1.4.1.4 Implicit .name port connection 

This is a feature available from SystemVerilog only. A new construct of 
specifying the port name only once with the “.name” convention, where the 
“name” is the port name. This avoids specifying the port name twice when 
the port name and signal name are the same. The instance port name and size 
should match the connecting variable port name and size during module 
instantiation. In the following example, the ports ini, in2 and in3 of both 
the instances of lower module don’t have any connecting variable port name. 

module lower (ini, in2, in3, outl, out2) ; 

input [7:0] ini, in2, ,in3; 
output [7:0] outl, out2; 

assign outl = ini & in2; 
assign out2 = ini | in3; 

endmodule 

module upper (ini, in2, in3, u_outll, u_outl2, u_out21, 
u_out22) ; 

input [7:0] ini, in2, in3; 
output [7:0] u_outll, u_outl2; 
output [7:0] u_out21, u_out22; 



wire [7:0] u outll, u outl2; 
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wire [7:0] u_out21, u_out22; 

// Instantiating lower with out2 floating 
lower U1 ( 

.ini , //no variable port name to these 
. in2 , // instance port names 
. in3 , 

. outl (u_outll) , 

. out2 (u_outl2) 

) ; 



// Instantiating lower with out2 floating 
lower U2 ( 

.ini , // no variable port names to 
. in2 , // these instance port names 
. in3 , 

.outl (u_out21) , 

. out2 (u_out22) 

) ; 



endmodule 

The synthesized logic of the above code instantiates the two lower 
modules, U1 and U2. The correct port connections are also established for 
the ports ini, in2, and in3. 

The advantage of the above method is that there is less chance of errors 
during instantiation, and it avoids repetition of names that implicitly match. 
Wherever exceptions and deviations exist, it needs to be explicitly specified. 
In the above, the connection to u_outll, u_outl2, u_out21, and 
u_out22 were made explicit. 

1.4.1.5 Interface port connection 

SystemVerilog has introduced a construct interface, which basically 
encapsulates a bundle of nets and variables into one group. When there are 
numerous ports that need to be connected to each other, it is easier to make 
the connections through the interface construct. This helps create less 
verbose and more maintainable code by grouping all common connections in 
just one place. Any future changes to the interfaces can be modified in the 
interface definition, and this will propagate to all the instances where this is 
being used. The above example is illustrated using the interface construct as 
follows: 
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interface basic_con; 

wire [7:0] ini, in2 , in3 ; // bi-dir wires 

endinterface : basic_con 

module lower (basic_con all_ins, // all inputs 
output [7:0] outl, out 2) ; 

assign outl = all_ins.ini & all_ins.in2; 
assign out2 = all_ins.ini | all_ins . in3 ; 

endmodule 

module upper (ini, in2, in3 , u_outll, u_outl2, 
u_out21, u_out22) ; 
input [7:0] ini, in2, in3 ; 

Output [7:0] U_OUtll, U_OUtl2; 
output [7:0] u_out21, u_out22; 

wire [7:0] u_outll, u_outl2; 
wire [7:0] u_out21, u_out22; 

basic_con top_ins(); 

assign top_ins.ini = ini; 
assign top_ins . in2 = in2; 
assign top_ins . in3 = in3; 

// Instantiating lower 
lower U1 ( 

// top_ins does .ini (ini), .in2(in2), .in3(in3) 

top_ins , 
u_out 1 1 , 
u out 12 



// Instantiating lower 
lower U2 ( 

// top_ins does .ini (ini), .in2(in2), .in3(in3) 

top_ins , 
u_out21 , 
u out22 
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endmodule 

In the above example, the top_ins is the interface instantiation that 
sufficed the purpose of specifying the port connection of the ini to in3 ports. 
Some of the salient points of the above example are: 

• The above example could also be extended to connect inter-module 
connections, that is, connections to-from U1 and U2 . 

• The interface specification and the explicit port connections could be 
mixed during one instantiation itself. 

1.4.2 Can there be full or partial no-connects to a multi-bit port of 
a module during its instantiation? 

No. There cannot be full or partial no-connects to a multi-bit port of a 
module during instantiation. For example, the following instantiation with an 
intermediate bit left to float is illegal, and gives a syntax error: 

// Instantiating lower with some port bits 
// unconnected 
lower Ul ( 

. ini (u_inl) , 

// bit 6 for in2 is floating in 8 bit in2 
. in2 ( {u_in2 [7] , , u_in2 [5 : 0] } ) , // Error 

// bits [5:3] for outl are unconnected in 8 bit 
/ / outl 

. outl ( {u_outl [7 : 6] , , u_outl [2 : 0] } ) , // Error 
. out 2 (u_out2 ) 

) ? 



In the case where there is a genuine situation to not connect a particular 
output, then it must be connected to an unused wire, and continue the 
concatenation with the appropriate bits to be connected. For example, in the 
above situation, the following two additional declarations, and the 
connections shown following it is a legal syntax: 

wire unusedl; 
wire [2:0] unused2 ; 
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// Instantiating lower with out2 floating 
lower U1 ( 

. ini (u_inl) , 

. in2 { { u_in2 [ 7 ] , unusedl , u_in2 [5:0] } ) , 

. outl ( {u_outl [7:5] , unused2 [2:0] , u_outl [1:0] } ) , 
.out 2 (u out2) 



Note that a floating input or an unused wire on an output will cause a “z” 
propagation into the logic. The outputs will drive values onto the unused 
wires, but these wires do not fanout to other logic, and will be optimized 
away by synthesis tools. 

1.4.3 What happens to the logic after synthesis, that is driving an 
unconnected output port that is left open (, that is, no- 
connect) during its module instantiation? 

An unconnected output port in simulation will drive a value, but this 
value does not propagate to any other logic. In synthesis, the cone of any 
combinatorial logic that drives the unconnected output will get optimized 
away during boundary optimisation, that is, optimization by synthesis tools 
across hierarchical boundaries. 

In the module lowerl is instantiated into an upperl module, and the 
same pins are connected all the way to the top level. When this code is 
synthesized, it will produce the logic as shown in figure 1-1. 

module lowerl (ini, in2, outl, out2) ; 

input ini , in2 ; 

output outl, out2; 

assign outl = ini & in2 ; 

assign out2 = ini | in2 ; 

endmodule 

module upperl (u_inl, u_in2 , 

u_outl, u_out2) ; 
input u_inl , u_in2 ; 
output u_outl, u_out2 ; 
lowerl U1 ( 

.ini (u_inl) , 

. in2 (u_in2) , 

.outl (u_outl) , 
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.out 2 (u_out2) 

) ; 

endmodule 



upper 




Figure 1-1. No unconnected ports 

When outl is left unconnected during the instantiation of the lower 
module, (this port is not going all the way to the top level of u_out 1) as 
shown in this figure, then the logic gets optimized with only the AND gate 
remaining, and the OR gate getting optimized away. 

Similarly, when out 2 is left unconnected during the instantiation of the 
lower module, the OR gate remains driving outl all the way to the top 
level, and the AND gate gets optimized away. 
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upper upper 




before after 



Figure 1-2. Gate eating behind an unconnected output port 



1.4.4 What value is sampled by the logic from an input port that is 
left open (that is, no-connect) during its module 
instantiation? 

By default, an unconnected input port is a floating port, and hence shows 
“z” during simulation. The logic following it will also propagate the “z”, 
until gated off by an AND gate. The following figure shows the ini 
floating in lower instantiation. 

Since ini was used as logic input to both the gates, and is no more 
driving both of them, the logic gets optimized and simplified into a simple 
wire connection between in2 and out2. This connection still maintains the 
AND’ing logic required between these two ports, as per its design. 

During synthesis, it is recommended to remove the unconnected ports 
using the synthesis tool commands, as it could potentially be undesirable 
during back-end processing. 
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before after 



Figure 1-3. When one of the inputs is floating 

The default value of z for unconnected input ports can be changed using 
the compiler directives: 

'unconnected_drive pul 10 
and 

"unconnected_drive pul 11 

The first directive causes all unconnected input ports to be pulled down 
to a logic 0. The second directive causes all unconnected input ports to be 
pulled up to logic 1. The effect of the "unconnected_drive directives 
can be turned off with the compiler directive "unconnected_drive . For 
example: 



'unconnected_drive pulll 

module ttl_74ls74 (elk, d, rst, pre, q, qb) ; 
input elk, d, rst, pre; // will pull up if left 

// unconnected 



output q, qb; 



endmodule 

"nounconnected_drive // for the rest of the code 
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1.4.5 How is the connectivity established in Verilog when 
connecting wires of different widths? 

When connecting wires or ports of different widths, the connections are 
right-justified, that is, the rightmost bit on the RHS gets connected to the 
rightmost bit of the LHS and so on, until the MSB of either of the net is 
reached. For example, 

wire [7:0] netl; 
wire [3:0] net2 ; 

assign netl = net2; 

// implicitly netl [3:0] are connected to 
// net2[3:0] and netl [7:4] is left floating 

assign net2 = netl; 

// The wires netl [3:0] are still connected to 
/ / net2 [3:0] 

Note, however, that some simulation and synthesis tools will give a 
Warning when connecting nets or ports of dissimilar widths. 

1.4.6 Can I use a Verilog function to define the width of a multi-bit 
port, wire, or reg type? 

The width elements of ports, wire or reg declarations require a constant 
in both MSB and LSB. Before Verilog 2001, it is a syntax error to specify a 
function call to evaluate the value of these widths. For example, the 
following code is erroneous before Verilog 2001 version. 

reg [get_high (vail , val2) : get_low (val3 , val4)] regl; 

In the above example, get_high and get_low are both function calls 
of evaluating a constant result for MSB and LSB respectively. 

However, Verilog-2001 allows the use of a function call to evaluate the 
MSB or LSB of a width declaration. 
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SUMMARY 

The chapter discussed a few basic questions on the usage of Verilog 
constructs during assignments and usage in task, function, port, and 
parameter. The chapter discusses the different approach of parameter and 
port specifications. A few SystemVerilog enhancements to the task and 
function have also been discussed. The next chapter discusses how the 
Verilog constructs are useful under the synthesis context. 




Chapter 2 

RTL DESIGN 



INTRODUCTION 

The chapter aims to address issues of the Verilog HDL that pertain to 
RTL design and logic synthesis. The focus is, in particular, on questions of 
logic inferences during synthesis, and static timing implications. The chapter 
concludes with explorations of power and DFT issues. 

2.1 Assignments 

This section discusses how the different assignments in Verilog are done, 
and what their implications are. The logic inferences of these different 
assignments are also discussed in this section. 

2.1.1 What logic is inferred when there are multiple assign 
statements targeting the same wire? 

It is illegal to specify multiple assign statements to the same wire in a 
synthesizable code that will become an output port of the module. The 
synthesis tools give a syntax error that a net is being driven by more than 
one source. For example, the following is illegal: 

wire tmp; 

assign tmp = ini & in2; // only one type of 

// output assignment is 
in2 ; // legal for synthesis 



assign tmp = ini 
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However, it is legal to drive a three-state wire by multiple assign 
statements, as shown in the following example: 

input enablel, enable2; 
wire tmp; 

assign tmp = (enablel == l'bl) ? 

(ini & in2) : l'bz; 
assign tmp = (enable2 == l'bl) ? 

(in3 | in4) : l'bz; 

2.1.2 What do conditional assignments get inferred into? 

Conditionals in a continuous assignment are specified through the 
operator. Conditionals get inferred into a multiplexor. For example, the 
following is the code for a simple multiplexor: 

wire wirel; 

assign wirel = (sel == l'bl) ? a : b; 




wirel 



Figure 2-1. Conditionals infer into a multiplexor 

2.1.3 What is the logic that gets synthesized when conditional 
operators in a single continuous assignment are nested? 

Conditional operators in a single continuous assignment can be nested as 
shown in the following example. The logic gets elaborated into a tree of 
multiplexors. 



input sell, sel2, sel3, ini, in2 , 
output outl ; 
assign outl = (sell == 

(sel2 == 

(sel3 == 



in3 , in4 ; 



l'bl) ? ini 
l'bl) ? in2 
l'bl) ? in3 



in4 ; 
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In the multiplexor units shown, it follows the logic that when sel is 
high, the output Z selects A, else selects B. 




Figure 2-2. Tree of multiplexors inferred from nested conditionals 

2.1.4 What value is inferred when multiple procedural assignments 
made to the same reg variable in an always block? 

When there are multiple nonblocking assignments made to the same reg 
variable in a sequential always block, then the last assignment is picked up 
for logic synthesis. For example, 

module lower (elk, ini, in2, out2) ; 
input elk, ini, in2 ; 
output out2 ; 
reg tmp; 

always @(posedge elk) begin 
tmp <= (ini A in2) ; 

tmp <= (ini & in2) ; 

tmp <= (ini | in2) ; 

end 



assign out2 = tmp; 
endmodule 
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ini 

in2 

elk 




Figure 2-3. Multiple assignments to the same reg variable 

In the example just shown, it is the OR logic that is the last assignment. 
Hence, the logic synthesized was indeed the OR gate. Had the last 
assignment been the “&” operator, it would have synthesized an AND gate. 

Note that the optimised synthesis results match the simulation behaviour. 
The IEEE Verilog standard defines that nonblocking assignments in a 
begin. ..end will be assigned in the order listed. Hence, in simulation only, 
the value of the last assignment is seen. 

Note, also, that the rules discussed and shown in this section apply when 
the variable on the LHS is not used on the RHS of subsequent assignments. 
The behaviour and synthesis implication for when a variable is used on both 
the LHS and RHS is discussed in the next FAQ. 

The same would be the case for a combinatorial always block, too. For 
example, 

always @(inl, in2) begin 
tmp = (ini & in2) ; 
tmp = (ini A in2) ; 

tmp = (ini | in2) ; // The final logic picked 

// up is the OR gate 

end 

Since multiple assignments to the same variable is legal, the user has to 
keep track of the statements, as to what is the final assignment required. If 
only one among the multiple assignments was to be selected, it would 
typically be in an if-else tree or a case statement. For example, the above 
always block would be represented typically as follows, in which case only 
one unique assignment is executed at each clock cycle. 

always @(posedge elk) begin 
if (sell) 
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tmp <= (ini | in2 ) ; 
else if (sel2) 

tmp <= (ini & in2) ; 
else 

tmp <= (ini * in2) ; 

end 

In the above example, there is no ambiguity as to which statement gets 
selected, as the branching controls are clearly defined. 

2.1.5 Why should a nonblocking assignment be used for sequential 
logic, and what would happen if a blocking assignment were 
used? Compare it with the same code in a combinatorial 
block. 

As discussed in chapter 1 , the main difference between the blocking and 
nonblocking assignment is that, in the blocking assignment, the RHS 
immediately gets assigned to the LHS, whereas for the nonblocking 
assignment, the assignment to the LHS is scheduled after the RHS is 
evaluated. 

The following illustrate the different scenarios of using blocking and 
nonblocking in a sequential code. 

2. 1.5.1 Using blocking statements in a sequential logic 

The following is an example of a Verilog module in which the blocking 
assignments have been used in the sequential block. 

module reg_test (elk, ini, outl); 
input elk, ini; 
output outl; 

reg regl, reg2 , reg3, outl; 

always @(posedge elk) begin 
regl = ini; 
reg2 = regl ; 
reg3 = reg2 ; 
outl = reg3 ; 
end 

endmodule 
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In the above example, the assignments to the regl, reg2, reg3, 
outl have been made as blocking assignments. The synthesized result is a 
single FF, with the d input of ini, and q output of reg3, as shown in the 
following figure: 



ini 

elk 




outl 



Figure 2-4. Logic inference with blocking assignments in sequential block 

This is because the intermediate results between ini and outl were 
stored in regl, reg2, and reg3 in a blocking format. As a result, the 
evaluation of the final result to outl didn’t require waiting for all the events 
of the RHS to be completed. Rather, they were immediately assigned to the 
LHS in the order specified. Observe that the signals regl, reg2, and 
reg3 have been optimised away by synthesis. 

2.1.5.2 Using nonblocking statements in a sequential logic 



The following illustration of code uses the nonblocking assignments in a 
sequential block: 



module 


reg_test 


(elk, 


input i 


oik, ini ; 




output 


out 1 ; 




reg regl, reg2. 


reg3 , 


always 


@ (posedge 


elk) 


regl 


<= ini; 




reg2 


<= regl; 




reg3 


<= reg2 ; 




outl 


<= reg3 ; 





end 

endmodule 



ini, outl) ; 



out 1 ; 
begin 



In the above example, the assignments to the regl, reg2, reg3, 
outl have been made as nonblocking assignments. The synthesized result 
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is the inference of as many FFs as specified in the always block [in this case, 
4 FFs]. 




outl 



Figure 2-5. Using nonblocking assignments in sequential logic 

This is because the intermediate results between ini and outl were 
stored in regl, reg2, and reg3 in a nonblocking format. As a result, 
the evaluation of the result to each individual reg required waiting for all the 
events of the RHS to be completed. In this case, it was the output of the 
previous register controlled by the elk event. As a result, the output is a 
shift register. 

2.1.5.3 Using blocking statements in a combinatorial logic 

The following example illustrates the use of blocking statements in 
combinatorial logic: 

module reg_test {elk, ini, outl) ; 
input elk, ini; 
output outl; 

reg regl, reg2, reg3 , outl; 

always @(inl) begin 
regl = ini ; 
reg2 = regl ; 
reg3 = reg2 ; 
outl = reg3 ; 
end 

endmodule 



In the above example, the blocking assignments are made in a 
combinatorial block. Note the absence of posedge and “<=”, being replaced 



42 



RTL Design 



with in the assignments. The logic synthesized out of this is a simple 
wire between ini to outl. 



ini 



outl 



Figure 2-6. Blocking statements in combinatorial block 

This is because all the assignments have been immediate, and there is no 
event to wait upon. 

2.2 Tasks and Functions 

Tasks and functions are primarily constructs that help in reusability of 
code that is being used in multiple places. Similar to the advantages seen in 
software programming, tasks and functions help in grouping statements 
with a particular intent in one code segment, and, hence, helps in better 
readability and maintenance. 

2.2.1 What does the logic in a function get synthesized into? What 
are the area and timing implications of calling functions in 
RTL? 

Since a function does not have any construct in it that advances time, a 
function basically infers combinatorial logic. If the logic falls into the 
critical path of a design, it is important to write the function in a timing 
optimal fashion. 

For example, the following function does an arithmetic operation using 
two inputs and a control. Its result is used in another expression, that calls 
the function. 

module lower (ini, in2, outl, out2, out3, out4); 

input [1:0] ini, in2 ; 

output [1:0] outl, out2, out3, out4 ; 

wire [1:0] outl, OUt2, out3, out4 ; 

function [1:0] arith; // declared in Verilog 1995 
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input [1:0] ini; // format with each input 

input [1:0] in2 ; // declared separately after 

input [1:0] operation; // the function 
begin 

case (operation) 



2 


bOO 


arith = ini 


Sc 


in2 


// 


AND 


gate 


2 


bOl 


arith = ini 


1 


in2 


// 


OR gate 


2 


blO 


arith = ini 


A 


in2 


// 


XOR 


gate 


2 


bll 


arith = ini 


% 


in2 


// 


mod 


operator 



default : arith = ini & in2 ; 



endcase 

end 

endfunction 



assign 


outl = arith (ini, 


in2 , 


0) ; 


// 


AND 


gate 


assign 


out2 = arith (ini, 


in2 , 


1) ; 


// 


OR gate 


assign 


out3 = arith (ini, 


in2 , 


2) ; 


// 


XOR 


gate 


assign 


out4 = arith (ini. 


in2 , 


3) ; 


// 


mod 


operator 



endmodule 

Whether the repeated calls to a function replicate the logic within the 
function or it multiplexes the logic within function, depends upon the path 
where the function is used. If the calls to a function are used in different 
paths, the logic gets replicated. In the above example, all the outputs had 
different use of the same function, and, hence, independent logic for each 
function call implemented different logic. If outl and outl both required 
the OR gate functionality, then it would use the common logic of the two 
function calls for both the outputs, that is, in effect, the outl and out 2 
would be connected to the same OR gate. However, any constant 
propagation techniques (see area optimisation techniques later this chapter 
for what constant propagation is) used within the function could influence 
the area. 

Note that the above function call can be declared in the Verilog-2001 
format, with the keyword input being part of function declaration, as 
follows: 

function [1:0] arith //no semicolon here 

(input [1:0] ini, in2 , // the inputs are now part 
input [1:0] operation // of the function decl 
);// multiple inputs like ini and in2 in one decl 
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Just like any other combinatorial logic, when the endpoint of the 
function is used as a D input to the flip-flop, then the function gets used to 
synthesize the sequential logic, too. For example, in the above code, the 
output outl was a combinatorial output. If it is made a registered output, 
then the function output is used to derive the flip-flop, as illustrated in the 
following example: 

reg [1:0] outl; // instead of wire 

always @(posedge elk or negedge reset) 
if ( ! reset) begin 
outl <= 0; 
end else begin 

outl <= arith(inl, in2 , 0); // function call 
end 

2.2.2 What are a few important considerations while writing a 
Verilog function ? 

The following are a few considerations while writing a Verilog function: 

• Local variables within a function and the function return value should 

be assigned values each time the function is called. Non initialization 
will cause a latch to be formed, as these variables are assigned every 
time upon entry of the function. For example, the if condition within the 
following example does not have an else clause. Because the function is 
static in simulation, it will behave as latched logic. That is, if sel is 
false, the function will return the value of its previous call, as if the 
result were latched. Synthesis, however, still does not infer a latch. It 
simply infers a gated function. 

module lower (ini, in2 , sel, outl, out2); 
input ini, in2 , sel; 
output outl, out 2 ; 

wire outl, out 2; 

function bad_latch; 
input ini, sel; 



begin 
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if (sel) 

bad_latch = ini; 
end 

endfunction 

assign outl = bad_latch ( ini , sel); 
assign out2 = bad_latch (in2 , sel); 

endmodule // lower 

In the above function calls, there are no variables to be initialized, and 
the logic inferred is the gating function, as illustrated in this figure: 



ini O 
sel O 



in2 D- 



D- 

D 



-O outl 



-O out2 



Figure 2-7. Function variables need to be assigned 

• Ensure that the width of the return value from a function is specified 
fully, else it will end up with a default of one bit. For example: 

module lower (ini, in2, all_outs) ; 
input [1:0] ini, in2 ; 
output [7:0] all_outs; 

wire [7:0] all_outs; 

function arith; // should have been 

// function [7:0] arith 
input [1:0] ini; 
input [1:0] in2; 

reg [1:0] outl, out2, out3, out4 ; 
begin 

outl = ini & in2 ; 

out2 = ini | in2 

out3 = ini A in2 

out4 = ini % in2 ; 

arith = {outl, out2, out3 , out4}; 
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end 

endfunction 

assign all_outs = arith(inl, in2); 
endmodule 

In the above example, the desired output was actually 8 bits, but since the 
width of [7:0] was not specified between the keyword function and the 
function-name, the value returned by the function call is only the last bit, 
that is, bit [0] of the actual intended result. 

• Functions are basically used to synthesize only combinatorial logic, 
however, the end result of this function can be used as a data input to 
the flip-flops, too. 

• Functions should not include the delay(#) or event control (@, wait) 
statements. 

• Functions may call other functions, but not other tasks. 

• A function returns a value when it is called. For more than one return 
item, there are two ways to deal with it. Before SystemVerilog, this 
could be achieved by concatenating the multiple values into the single 
return. In the previous example, the output arith is a concatenation of 
multiple outputs that need to be driven by a single function call. The 
desired output fields from the result are then derived to drive the 
required signals. For example, 

module lower (ini, in2 , outl, out.2 , out3, out4); 

input [1:0] ini, in2 ; 

output [1:0] OUtI, OUt2, OUt3, OUt4 ; 

wire [1:0] outl, out2, out3, out4; 
wire [7:0] all_outs; 
function [7:0] arith; 
input [1:0] ini; 
input [1:0] in2 ,- 

reg [1:0] outl, out2, out3, out4; 
begin 

outl = ini & in2 ; 
out2 = ini | in2 ; 
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out3 = ini A in2 ; 
out4 = ini % in2 ; 
arith = {outl, out2, out3, out4}; 
end 

endfunction 

assign all_outs = arith (ini, in2) ; 
assign outl = all_outs [7 : 6] ; 
assign out2 = all_outs [5 : 4] ; 
assign out3 = all_outs [3 : 2] ; 
assign out4 = all_outs [1 : 0] ; 

endmodule 

In the above example, the different outputs outl to out 4 were all 
concatenated and assigned to the function name. The different fields can be 
then extracted out of the wire to which the function drives. 

With SystemVerilog, it is possible to have a formal output and inout 
declaration. The same example in SystemVerilog is as follows: 

module funct_output (ini, in2, outl, out2 , 

out 3, out4) ; 



input [1:0] ini, in2 ; 

output [1:0] outl, OUt2, OUt3, OUt4 ; 

reg [1:0] outl, out2 , out3, out4; 

// void (that is, doesn't return anything) 
function void arith; 
input [1:0] ini, in2 ; 
output [1:0] outl, OUt2, OUt3, OUt4 ; 
begin 

outl = ini & in2 ; 
out2 = ini | in2 ; 
out 3 = ini A in2 ; 
out4 = ini % in2 ; 
end 

endfunction 



always_comb // SystemVerilog construct 
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begin 

arith (ini, in2, outl, out2, out3, out4) ; 
end 

endmodule 

• Parameters and integers can be declared within a function, but they 
become local only to that function, and cannot be used outside the scope 
of the function. In the following example, the widthl parameter 
defined within the function double_width is not visible outside its 
scope for the $display statement that follows later. 

parameter width = 32; 

function integer double_width; 
input integer in_width; 
parameter widthl = 64; 
double_width = in_width * 2 ; 
endfunction 

initial begin 

$display ( "widthl = %0d" , widthl) ; // syntax error 
end 

2.2.3 What does the logic in a task get synthesized into? Explain 
with an example. 

Although it is legal to have time advancing or controlling constructs like 
@ within a task, it works only for simulations. The synthesis tools ignore all 
timing constructs within a task. Hence, a simulation and synthesis mismatch 
can occur if the functionality depends upon presence of timing control 
constructs within a task. Thus, a task can be used to synthesize basic 
combinatorial logic. However, if the destination of the task call is a storage 
element used within a sequential block, then a sequential element gets 
synthesized. Whether the logic within the task will keep replicating 
whenever it is called or reused depends upon the path where the task is used. 
If the task call is for independent paths that can be used concurrently, then 
independent logic will be synthesized for each path, and the area grows 
linear to the number of tasks called. If the task is used among common 
paths, then the logic in its inputs or outputs could be reused, depending upon 
the path from which the task is called. 
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The following is an example of a combinatorial task, namely, 
combtask, which performs the task of unary OR’ing the input ini, and 
producing it in the output of the task. Note that an intermediate reg 
declaration of int_outl and int_out2 was required, because the output 
of a task can be received only by a reg and not a wire. 

module comb_task (ini, in2 , outl, out2) ; 
input [3:0] ini, in2 ; 
output outl, out 2 ; 

reg int_outl, int_out2; 

task modify_value ; 
input [3:0] value; 
output int_val ; 
reg int_val ; 
begin 

int_val = ( | (value) ) ; // Combinatorial 

// operation in a task 

end 

endtask 

always @(inl) begin 

modi fy_value (ini, int_outl) ; 
end 

always @(in2) begin 

modify_value (in2 , int_out2) ; 
end 

assign outl = int_outl; 
assign out2 = int_out2 ; 

endmodule 

Many synthesis tools give a compilation error if sequential constructs are 
present within a task. 
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2.2.4 What are the differences between using a task, and defining a 
module for implementing reusable logic? 

We have already seen in the previous question, that a task can be used to 
call same logic multiple times. Similarly, a module can also be defined, and 
the logic within it will get replicated as many times as it is instantiated. The 
following table summarizes the differences between the two approaches: 



Table 2-1. Table summarizing the c 


ifference between task and module 


task 


module 


A task cannot instantiate a module 
within it 


Fundamentally tasks can be called 
only within a module 




A module instantiation has a 
hierarchy that can be fully identified 
and placed as a block during 
floorplanning 


Prior to the advent of Verilog 2001, 
tasks in Verilog are not re-entrant. 
Therefore, if a task uses internal 
local variables, it could be multiply 
invoked in overlapping time- 
domains 


By definition, modules can be 
instantiated multiple times. Each 
instance will carry its own context, 
including all of its internal registers 
and other variables. Therefore, 
processes within these instances are 
inherently concurrent among 
themselves and also across instances 



2.2.5 Can tasks and functions be declared external to the scope of 
module-endmodule? 

Yes. With SystemVerilog, it is possible to declare the task and function 
definitions external to the scope of module-endmodule. This is not possible 
with Verilog-1995 or Verilog-2001, and will give a compilation error. For 
example, in the following code, the task modify_value is declared 
outside the scope of the module-endmodule. 



task modify value; 


// 


input [3:0] value; 


// 


output int_val; 


// 


reg int_val; 




begin 




int val = ( | (value) ) 


; // 




// 



this task is defined 
outside the scope of 
module-endmodule 

Combinatorial operation 
in a task 



end 
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endtask 

module ext_task (ini, outl) ; 
input [3:0] ini; 
output outl; 

reg int_outl; 

always @(inl) begin 

modi fy_value (ini , int_outl) ; 
end 

assign outl = int_outl; 
endmodule // ext_task 

Similarly, with SystemVerilog, a function-endfunction can also be 
declared outside the scope of module-endmodule within the same file. If 
these contents are defined in a separate file, it needs to be part of the same 
compilation command. 

2.3 Storage Elements 

There are primarily two kinds of storage elements inferred in logic 
synthesis, that is, Flip-flops and latches. This section describes the 
implementation and comparison between the two elements. 

2.3.1 Summary of RTL templates for different flip-flops types 

The storage element of flip-flop or latch inferred from RTL depends 
upon the style in which it is written. The following is a quick summary of a 
few templates of different register and latch inferences. In the flip-flop 
templates, they infer a positive edge triggered flip-flop. If the keyword 
posedge elk is replaced with negedge elk, then a negative edge 
triggered flip-flop is inferred. 

1. Simple D Flip-flop 

Positive edge triggered, no set or reset, value of Q is unknown at power 
on 



module dff (elk, d, q) ; 
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input elk, d; 
output q; 

reg q ; 

always @(posedge elk) begin 
q <= d; 
end 

endmodule 

In SystemVerilog, the same code would be implemented with always _Jf 
in place of the always keyword, as follows: 

always_ff ©(posedge elk) begin 

q <= d; 

end 

The advantage of always _Jf over always is that, always _Jf indicates that 
the designers intent is to model clocked sequential logic. Software tools can 
then verify that the blocks sensitivity list and functionality correctly 
represent the type of logic intended. 

2. Asynchronous set FF 

Positive edge triggered, active high asynchronous set 

module asff (elk, d, set, q) ; 
input elk, d, set; 
output q; 

reg q; 

always @(posedge elk or posedge set) begin 
if (set) 
q <= l'bl; 
else 

q <= d; 

end 

endmodule 

Replacing the always keyword with always _Jf above would implement 
the asynchronous FF in SystemVerilog. 
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3. Asynchronous reset FF 

Positive edge triggered, active high asynchronous reset 

module arff (elk, d, reset, q) ; 
input elk, d, reset; 
output q ; 

reg q; 

always @(posedge elk or posedge reset) begin 
if (reset) 
q <= 1'bO; 
else 
q <= d; 

end 

endmodule 

Replacing the always keyword with always _Jf above would implement 
the asynchronous FF in SystemVerilog. 

4. Asynchronous set and reset FF 

Positive edge triggered, active high asynchronous set and reset 

module arsff (elk, d, set, reset); 
input elk, d, set, reset; 
output q; 

reg q; 

always @ (posedge elk or posedge set or 
posedge reset) 

begin 

if (set) 
q <= l'bl; 
else if (reset) 
q <= 1 ' bO ; 
else 
q <= d; 

end 

endmodule 
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Replacing the always keyword with always _ff above would implement 
the asynchronous FF in SystemVerilog. 

5. Synchronous set FF 

Positive edge triggered, active high synchronous set 

module ssff(clk, d, set, q) ; 
input elk, d, set; 
output q ; 

reg q; 

always @(posedge elk) begin 
if (set) 
q <= l'bl; 
else 
q <= d; 

end 

endmodule 

Replacing the always keyword with always _ff above would implement 
the asynchronous FF in SystemVerilog. 

6. Synchronous reset FF 

Positive edge triggered, active high synchronous reset 

module srff (elk, d, reset, q) ; 
input elk, d, reset; 
output q; 

reg q; 

always @{posedge elk) begin 
if (reset) 
q <= 1 ' b 0 ; 
else 

q <= d; 

end 

endmodule 

Replacing the always keyword with always _ff above would implement 
the asynchronous FF in SystemVerilog. 




RTL Design 



55 



7. Synchronous set and reset FF 

Positive edge triggered, active high synchronous set and reset 

module ssrff (lk, d, set, reset, q) ; 
input elk, d, set, reset; 
output q ; 

reg q; 

always @(posedge elk) begin 
if (set) 
q <= l'bl; 
else if (reset) 
q <= 1'bO; 
else 
q <= d; 

end 

endmodule 

Replacing the always keyword with always _ff above would implement 
the asynchronous FF in SystemVerilog. 

2.3.2 Summary of RTL templates for different Latch types 

1. Simple D Latch 

module dl (sel, d, q) ; 
input sel, d; 
output q ; 

reg q; 

always @(sel, d) begin 
if (sel) 
q <= d; 

end // Note the else clause is missing 

endmodule 

In Verilog-2001, the same latch can be implemented as: 

always @(*) // note implicit sensitivity list 
if (sel) 
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q <= d; 

end 

In SystemVerilog, the same latch can be implemented using the keyword 
always Jatch, as: 

always_latch // no explicit sensitivity list 
if (sel) 
q <= d; 

end 

Note that it was not required to specify anything in the sensitivity list of 
the alwaysjatch block, as this procedure determines its sensitivity 
automatically. One advantage of the alwaysjatch keyword is that, it 
explicitly shows the designer intends to model a latch. Software tools can 
then check that the functionality within the procedural block correctly 
represents latched logic. Another important advantage of alwaysjatch is 
that, it is automatically evaluated once at simulation time 0, even if the sel 
input did not change at time 0. This ensures that at the start of simulation, 
the latch output is correctly reflecting the latch inputs. 

2. Asynchronous set latch 

module asl (sel, d, set, q) ; 
input sel, d, set; 
output q; 

reg q; 

always @(sel, d, set) begin 
if (set) 
q = l'bl; 
else if (sel) 
q = d; 

end // Note the final else clause is missing 

endmodule 

In SystemVerilog, the same always procedure above can be implemented 
using the alwaysjatch, instead of the always keyword, without any 
sensitivity list. 




RTL Design 



57 



3. Asynchronous reset latch 

module arl (sel, d, reset, q) ; 
input sel, d, set; 
output q; 

reg q ; 

always @(sel, d, reset) begin 
if (reset) 
q = 1 ' b 0 ; 
else if (sel) 
q = d; 

end // Note the else clause is missing 

endmodule 

In SystemVerilog, the same always procedure above can be implemented 
using the always _latch, instead of always keyword, without any sensitivity 
list. 

4. Asynchronous set and reset latch 

module asrl (sel, d, set, reset, q) ; 
input sel, d, set, reset; 
output q; 

reg q; 

always @(sel, d, reset) begin 
if (reset) 
q = 1 ' bO ; 
else if (set) 
q = 1; 

else if (sel) 
q = d; 

end // Note the final else clause is missing 

endmodule 

In SystemVerilog, the same always procedure above can be implemented 
using the always Jatch, instead of always keyword, without any sensitivity 
list. 




58 



RTL Design 



A few salient points to be noted in the above inferences: 

• In asynchronous set or reset storage elements, the asynchronous input 
has higher priority than the data input (hence, it is in the top of the if- 
else tree). Therefore, when an asynchronous input and the data inputs 
arrive at the same time, the effect of the asynchronous input prevails at 
the output. 

• All the above examples show a single bit implementation of the defined 
storage element. By increasing the bit width of the reg declaration, the 
number of flip-flops or latches will be equal to the width of the reg 
declaration. For example, 

reg [3:0] outl; 

This will create 4 of the outl flip-flops or latches. 

• There need not be one always block for each flip-flop. Many flip-flops 
can be inferred within an always block. But the restriction in this 
approach is that ALL of the FF definitions within that always block will 
infer the same type of flip-flop as defined in the sensitivity list of the 
always block. 

• Although normally all the bits of the storage elements are either set or 
reset, it is not uncommon to assign values of l’bl and 1’bO to the 
different bits of the same register during the set or reset condition. For 
example, in this 4 bit FF, the reset values of the flops are 4’bl010. 

module lower (ini, elk, reset, outl) ; 

input [3:0] ini; 

input elk, reset; 

output [3:0] outl; 

reg [3:0] outl; 

always @(posedge elk or negedge reset) 

begin 

if ( ! reset) 

outl <= 4'bl010; 
else 

outl <= ini; 

end 



endmodule 
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• A common coding practice is to use only nonblocking assignments for 
inferring flip-flops and latches, and only blocking for inferring 
combinatorial logic. 

• Using SystemVerilog frees the user from specifying the elements of the 
sensitivity list for the latch inferences. It also ensures that the latch 
output values are correct at the start of simulation. These features would 
reduce the possibility of simulation and synthesis mismatches. 

2.3.3 What are the considerations to be taken choosing between 
flop-flops vs. latches in a design? 

Both latches and FFs have their relative advantages and disadvantages in 
their implications, as summarized in the table below: 



Table 2-2. Consideration of latch am 


Flip-flop features for design choice 


Latch 


Flip-flop 


Area of a latch is typically less than 
that of a Flip-flop 


Area of a Flip-flop for same features 
is more than that of a latch 


Consumes lesser power, due to lesser 
switching activity and lesser area 


Power consumption is typically 
higher, due to the area and free 
running clock. Additional controls 
required to save power 


Facilitates time borrowing or cycle 
stealing; Helps increase pipeline 
depth with lesser area.; Even if the 
path is longer than a clock cycle for 
a latch based pipeline, it is okay as 
long as it meets the next latch setup 
margin 


Since the clock boundaries are rigid, 
the facility of time borrowing or 
cycle stealing doesn’t exist with FFs. 
A negative slack cannot be 
propagated to the timing of the next 
stage in pipeline and hence must 
execute within a clock period 


In multiple clock schemes, the clock 
edges must not be overlapping; It 
makes the logic design, vector 
generation for verification and clock 
tree synthesis difficult 


Clock tree synthesis is less tedious in 
FF based designs. Since the stimulus 
needs to be stable before the setup 
time of the clock, the vector 
generation is relatively easier 


With time borrowing* and cycle 
stealing, the operating frequency is 
higher than the slowest logic path 


Due to rigid timing boundaries, the 
slowest path pretty much decides the 
operating frequency 


Makes time budgeting and 
characterizing the interfaces tedious 


The time budgeting is clearer and 
characterizing the interface is easier 



(*) Time borrowing is a mechanism in which a latch based design takes 
advantage of the transparency between two back to back latches that are 
enabled in order to meet the propagation delay between the two latches. This 
is best illustrated by a simple analysis as follows: 
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Consider two latches LI and L2. While both of them have the same 
clock frequency, the enables for LI and L2 are opposite in polarity. The LI 
is enabled in the high phase of the clock, while L2 is enabled in the low 
phase of the clock. This connection is shown in the following figure: 

L2 

d q 
en 

elk 1 I 

clk2 




clk2 

clkl 




10 ns 






Tpd = 7.5ns 




Time borrowed 



Figure 2-8. Illustration of time borrowing in latches 

For the purpose of simplifying the analysis, the d"^q delay or cn~^q 
delay of LI and L2 is assumed to be Ons. The propagation delay of the 
combinatorial logic is 7.5ns. If it were a flip-flop based design with the same 
rising edge clock in place of clkl and clk2, this would be clearly a setup 
violation. However, in a latch based design above, since the delay through 
latch is Ons, the input ini is latched immediately at the output of LI, and 
begins to propagate. The propagation delay enters into the ON time of the 
second latch L2, and settles at some point during its ON time. The 
propagation delay has caused the logic to borrow time from the second latch, 
in order to settle its outputs, and hence is called time borrowing. 
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2.3.4 Which one is better, asynchronous or synchronous reset for 
the storage elements? 

The following table summarizes the comparison between using 
synchronous and asynchronous reset logic for a design: 



Table 2-3. Summary of differences between asynchronous and synchronous 

reset 



Asynchronous reset 

Reset signal is not a part of the data 
path, that is, not a part of logic for D 

input of the FF 

Effect of reset can happen anytime 

asynchronously 

Doesn’t depend upon the presence of 

an active clock signal 

Asynchronous event is an overload, 
compared to synchronous reset in the 

cycle based simulators 

Not recommended for internally 
generated resets, due to glitches 



Reset input from external sources 
can be prone to glitches, the final 
reset signal needs to be synchronized 
before applying it to all storage 

elements 

Asynchronous reset input still needs 
the double FF synchronization to 
avoid race condition during de- 

assertion 

Needs to meet only the minimum 
reset pulse width required for the FF 



Synchronous reset 

Reset signal is part of the data path, 
that is, the D input of the FF 



Effect of reset will happen only on 

the active edge of a clock 

Depends upon the presence of the 
clock signal for the reset to happen 
Works well when using cycle based 
simulators 



For internally generated resets, 
synchronous approach is the best 

mechanism 

Not prone to glitches from internal 
or external sources 



The additional synchronization 
circuitry is not required as it is a part 
of the default synchronous logic 

requirement 

Reset pulse width has to be long 
enough to be sampled on an active 
clock edge 
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Asynchronous reset 


Synchronous reset 


Example code of an asynchronous 


Example code for a synchronous low 


low reset 


reset 


always @(posedge elk or 


always @(posedge elk) 


negedge reset) 




begin 


begin 


if (Ireset) 


if (Ireset) 


outl <= 0; 


out2 <= 0; 


else 


else 


outl <= ini; 


out2 <= in2; 


end 


end 



2.3.5 What logic gets synthesized when I use an integer instead of a 

reg variable as a storage element? Is use of integer 
recommended? 

An integer can take the place of a reg as a storage element. An example 
to illustrate this is as follows : 

module int_insteadof_reg (ini, elk, reset, outl); 
input [3:0] ini; 
input elk, reset; 
output [3:0] outl; 

integer int_tmp; 

// reg [3:0] int_tmp; // Normally we use this reg 
// declaration 

always @(posedge elk or negedge reset) 
begin 

if ( ! reset) 
int_tmp <= 0; 
else 

int_tmp <= ini; 

end 

assign outl = int_tmp; 
endmodule 

In this example, the variable int_tmp is defined as an integer, instead 
of the reg that it would normally be (the reg declaration is commented in the 
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example for illustration). Note that, although the default width of the integer 
declaration is 32 bits, the final result of the int_tmp registers synthesis 
yield is only 4 bits. This is because the optimiser in the synthesis tool 
removes the unnecessary higher order bits, in order to minimize the area. 

Although the use of integer as shown above is a legal construct, it is not 
recommended for the synthesis of storage elements. 

2.4 Flow-control Constructs 

Verilog has primarily three kinds of flow control constructs, that is, case, 
if-else and conditionals. The construct has already been discussed 
earlier FAQ 2.1.2 and 2.1.3. This section primarily illustrates the 
implementation details and questions about case and if-else constructs. 

2.4.1 How do I choose between a case statement and a multi-way 
if-else statement? 

Both case and if-else are flow control constructs. Functionally in 
simulation they yield similar results. While both these constructs get 
elaborated into combinatorial logic, the usage scenarios for these constructs 
are different. 

A case statement is typically chosen for the following scenarios: 

• When the conditionals are mutually exclusive and only one variable 
controls the flow in the case statement. The case variable itself could be 
a concatenation of different signals. 

• To specify the various state transitions of a finite state machine 

• Use of case x and casez allows use of x and z to represent don’t-care bits 
in the control expression 

A multi way i/statement is typically chosen in the following scenarios: 

• Synthesizing priority encoded logic 

• When the conditionals are not mutually exclusive and more general in 
using multiple expressions for the condition expression. 

The advantages of using the case over if-else is as follows: 

• case statements are more readable than if-else 

• When used for state machines, there is a direct mapping between the 
state machine’s “bubble diagram” and the case description. 
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In a case construct, if all the possible cases are not specified, and the 
default clause is missing, a latch is inferred. Likewise, for an if-else 
construct, if a final else clause is missing, a latch is inferred. 

2.4.2 How do I avoid a priority encoder in an if-else tree? 

An if-else tree may synthesize to a priority encoded logic. For example, 
the following code produces a priority encoder: 

module priorityencoder (inO, ini, in2 , in3 , sel); 

input inO, ini, in2 , in3 ; 
output [1:0] sel; 
reg [1:0] sel; 

always @(in0, ini, in2, in3) begin 
sel = 2'b00; 
if (inO) sel = 2 'b00; 
else if (ini) sel = 2'b01; 
else if (in2) sel = 2'bl0; 
else if (in3) sel = 2'bll; 
end 

endmodule // priorityencoder 

In simulation, the if-else-if series is evaluated in the order listed. If inO 
and ini were both true, the inO branch would be taken, because inO is 
evaluated first. Synthesis tools will create a priority encoded logic in this 
example, so that the logic generated will behave the same as the RTL 
simulation. 

If a priority encoder is not the intention, the logic needs to be synthesized 
in parallel. The keyword unique that is introduced in the SystemVerilog can 
be used for this purpose. The unique keyword indicates that the order of 
decisions is not important. The if statement would be the same, with the 
unique keyword prepending the first if, as follows: 

unique if (inO) sel = 2'b00; 
else if (ini) sel = 2'b01; 
else if (in2) sel = 2'blO; 
else if (in3) sel = 2'bll; 
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This would synthesize into a parallel logic, that is, a multiplexor. 

The SystemVerilog standard requires that simulation (or other tools) 
report a Warning if they detect that more than one branch could be executed 
at the same time. In the preceeding example with unique if, if both inO and 
ini were both true at the same time, a run-time Warning would be reported. 

On a related note, SystemVerilog has also introduced the keyword 
priority, which functions opposite to unique, by enforcing priority encoded 
logic. When the priority construct is used, it indicates that the order of 
decision making is important. If the unique statement in the above is 
replaced by priority, then the same priority select logic tree will be 
regenerated. 

2.4.3 What are the differences between if-else and the (“?:”) 
conditional operator? 

The following table summarizes the differences between the two flow 
control constructs, that is, conditional and the if-else. 



Table 2-4. Summary of differences between the conditional and if-else 

operator 
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2.4.4 What is the importance of a default clause in a case 
construct? 

The default clause in a case statement indicates that when all other cases 
are not met, then the flow can branch to the statements in the default clause. 

This gives the synthesis tool an option to pick a branch when no other 
condition is satisfied. If the default clause is missing, the logic will have to 
remember what the output was earlier, and hence a latch will get 
synthesized. For example, the following case statement will generate a latch: 

module def ault_latch (ini, in2 , opcode, outl); 
input [1:0] ini, in2 , opcode; 
output [1:0] outl; 

reg [1:0] outl; 



always @(inl or in2 or opcode) begin 
case (opcode) 



2 


bOO 


outl = ini 


& 


in2 




2 


bOl 


outl = ini 


| 


in2 




2 


blO 


outl = ini 




in2 




// 2 


bll 


outl = ini 


Q, 

o 


in2 


// uncommenting 



// either of these 
// two lines will 

// default : outl = ini & in2 ; // avoid a latch 
endcase 
end 

endmodule 

In the above, with the two lines commented, a latch gets synthesized for 
outl register. Un-commenting either the default clause or the last condition 
of 2’bll, or both, will result in the combinatorial logic of a multiplexor to be 
synthesized. 

2.4.5 What is the difference between full_case and parallel_case 
synthesis directive? 

The difference between full case and parallel case synthesis directives is 
summarized in the table below: 
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Table 2-5. Difference between full case and parallel case 



full case 


parallel case 


Indicates that the case statement has 
been fully specified, and all 
unspecified case expressions can be 
optimized away 


Indicates that all case items need to 
be evaluated in parallel and not infer 
any priority encoding logic 


All control paths are specified 
explicitly or by using a default 


There is no overlap among the case 
items 


Helps avoid latches as all cases are 
fully specified 


Results in multiplexor logic as a 
parallel logic 


Although not recommended, the 
default clause can be avoided, and 
still not infer a latch 


A priority encoder is NOT 
synthesized, as each path is unique 


An example of a case statement that 
is full (and parallel) is shown below: 

reg varl [1:0] ; 
always @(a or b or c) begin 
case (varl) 

2 'b00 : OUtl = a; 

2'b01 : outl = b; 

2'blO : outl = C; 

2'bll : outl = a&b; 

endcase 
end 


An example of a case statement that 
is parallel (not full) is shown as 
follows: 

reg varl [2:0] ; 
always @(a or b or c) begin 
case (varl) 

3'b000 : outl = a; 

3'b001 : outl = b; 

3'bOlO : outl = c; 

// rest of the cases are // 
not defined 
endcase 
end 


Note that the default clause was not 
required here as it is fully specified 
(although having it is a good coding 
practice). 


Note that the above case doesn’t 
have a default clause; but each 
branch is definitely unique, but all 
cases are not specified, that is, 
branches missing for 2 , 3 , 4 , 5 , 6 , 7 . The 
outl register will get synthesized 
into a latch 



2.4.6 What is the difference in implementation with sequential and 
combinatorial processes, when the final else clause in a multi- 
way if-else construct is missing? 

The results are different, depending upon whether the if statement is a 
part of a sequential always block or a combinatorial always block. 
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In a combinatorial always block, when the final else clause in a multi- 
way if-else statement is missing, it will infer a latch. The latch is inferred 
because the register has to remember the value until it is reloaded again. For 
example, 

reg latchl ; 

always @(sel, ini) begin 
if (sel) 

latchl <= ini; 

end 

In a sequential always block, if the final else clause in a multi-way if-else 
statement is missing, it will still go ahead and infer the flip-flop, with the 
combinatorial inference of the logic in the D input of the flop. For example, 

reg ffl; 

always @(posedge elk or negedge reset) begin 
if (reset) 
ffl <= 1'bO; 
else begin 
if (sell) 

ffl <= ini; // no else clause here 

end 

end 

The above code will infer logic, as shown below. The D input to the flop 
is now a simple gated function of the inputs. 



ini 

sel 





D— 


d q 


elk 


ck 



outl 



Figure 2-9. Logic inference of if statement without final else in a FF 
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2.4.7 What is the difference in using (== or !=) vs. (===or !==) in 

decision making of a flow control construct in a synthesizable 
code? 

In Verilog, the (==) operator is called logical equality, and (!=) is called 
logical inequality operator. The (===) operator is called case equality, and 
(!==) is called case inequality. The following are the differences in using 
these constructs in synthesizable code. 



Table 2-6. Differences between ==- and = operators 





Use of = or !== operators 






If either of the operands have x or z 
value, the result is unknown 


The operands will be compared, 
even if they have x and z values in 
the bits 


If any of the operators is x or z, the 
logical result of comparison is 
always FALSE 


The x and z bits will be used in 
comparison, and the logical result 
will be a TRUE or FALSE, based on 
actual comparison 


Since the operands contain x and z, 
the result will be an x. Hence, the 
comparison can be non-deterministic 


Since x and z are also used in 
comparison, the result of comparison 
will be Boolean 1 or 0. Hence the 
comparison can be deterministic 


Example of using (== or !=) 
operators 
if (a == b) 
outl = a & b; 
else 

outl = a | b; 


Example of using (=== or !==) 
operators 
if (a === b) 
outl = a & b; 
else 

outl = a | b; 


If either a or b becomes x or z, the 
else clause will be executed and 
outl will be driven by OR gate 


If a and b are identical, even if they 
becomes x or z, the if clause will be 
executed and outl will be driven by 
AND gate 



2.4.8 Explain the differences and advantages of casex and casez 
over the case statement? 

The case x operator has to be used when both the high impedance value 
(z) and unknown (x) in any bit has to be treated as a don’t-care during case 
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comparisons. The casez operator treats the (z) operator as a don’t-care 
during case comparisons. 

In both cases, the bits which that are treated as don’t-care will not be 
considered for comparison, that is, only bit values other than don’t care bits 
are used in the comparison. The wildcard character “?” can be used in place 
of “z” for literal numbers. 

The following is an example of a casex statement 



input [2:0] ini; 
reg [2:0] regl; 
casex (ini) 



3'bOxO : 


outl = 


a & b; 


// 


same as 


conditions 








// 


3'b010, 


3'b000 


3'bxlO : 


outl = 


a | b; 


// 


same as 


conditions 








// 


3'bllO, 


3'b010 


default 


: outl 


= a A b; 


// 


for all 


other 








// 


conditions 



endcase 



The same example, if written with an if-else tree, would look like: 

// bit ini [1] is not considered at all 
if (I ini [2] & !inl[0]) outl = (a & b) ; 

// bit ini [2] is not considered 

else if (inl[l] & !inl[0]) outl = (a | b) ; 

// default clause 
else outl = (a A b) ; 

Using casex or casez has the following coding advantages: 

• it reduces the number of lines, especially if the number of bits had been 

more 

• makes code look more clear and less cluttered 

• Simplifies the optimization, as it is clear that the bits with x are to be 
ignored. 

2.5 Finite State Machines 

Finite State Machines or FSMs form an important part of the control 
logic in the designs. This section also discusses the differences among the 
various types of the FSM coding styles. 
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2.5.1 What are the differences between synchronous and 
asynchronous state machines? 

Synchronous and asynchronous are two fundamental types of state 
machines. They differ in the following ways: 



Table 2-7. Differences between asynchronous and synchronous state 

machines 



Asynchronous state machines 


Synchronous state machines 


State transitions depend upon the 
order in which the input signals 
change 


State transitions are controlled by a 
clock signal 


State transitions happen after 
propagation delay of the state line 


State transitions happen at intervals 
of the clock period 


Delay lines act as memory elements 


Edge triggered FFs or level sensitive 
latches act as storage elements 


Output response time is not 
predictable 


Output response time is predictable; 
will happen at clock period intervals 



2.5.2 Illustrate the differences between Mealy and Moore state 
machines. 

Both Mealy machine and Moore machine are two commonly used coding 
styles of state machines. The basic block diagram of these two state 
machines are shown as follows: 
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Figure 2-10. Block diagram of a Mealy machine 




Figure 2-11. Block diagram of a Moore machine 
The state machines differ in the following ways: 



Table 2-8. Differences between Mealy and Moore state machines 



Mealy machine 


Moore machine 


Outputs are a function of current 
state and input signals 


Outputs are a function of current 
state only 


Output can change between changes 
between state 


Outputs change only when the 
current state changes 


Output can changes any number of 
times during a clock cycle, which 
may result in glitches on the outputs 


Output is delayed by one clock 
cycle, but is stable 


More output combinations are 
possible as the outputs are a function 
of inputs too 


Since the outputs are a function only 
of the current state, the numbers of 
output combinations are fewer with 
the Mealy machine 
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Mealy machine 


Moore machine 


If the inputs are not registered, the 
combinatorial paths could potentially 
be larger than Moore machine; 
Hence, a relatively lower frequency 
is expected compared to a Moore 
machine 


Can expect higher frequency 
compared to Mealy machine, as the 
combinatorial paths are typically 
shorter, and no input paths are 
involved 



2.5.3 Illustrate the differences between binary encoding and one- 
hot encoding mechanisms state machines. 

The encoding in state machines are primarily either binary [sometimes 
called sequential] encoding or the one-hot encoding. Both mechanisms 
eventually lead to decoding of the states, but their logic implementation, 
timing and area implications differ. The differences are summarized in the 
table as follows: 



Table 2-9. Difference between sequential and one-hot encoding 



Binary encoding 


One-hot encoding 


Requires fewer number of FFs to 
represent current state 


Number of FFs required is equal to 
the number of states in the FSM 


As there is combinatorial path in the 
output logic, its timing is not as good 
as the one-hot encoding mechanism 


Better output timing, as there is no 
output logic. Only clk->q delay, and 
hence faster 


Preferred approach in ASICs unless 
the timing in output path is critical 


Useful and necessary in register rich 
application like FPGAs 


Since the number of FFs is limited, 
good optimization is required for 
encoding 


Don’t need to optimize the state 
encoding, as each state has unique 
flop anyway. 


Adding or deleting states requires 
tracking the side effects to the other 
states in the FSM 


Easy to maintain, that is, adding or 
deleting states is easy, and doesn’t 
effect the rest of the states 


Tedious to debug, since a wrong 
state transition needs a walk through 
of the next state combinatorial logic 


Easy to debug, since a wrong state 
transition can be easily detected by 
looking at the current state values 


Critical path analysis requires 
tracking the combinatorial logic 


Easy to find critical paths during 
Static Timing Analysis (STA) 
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2.5.4 Explain a reversed case statement, and how it can be useful 
to infer a one-hot state machine? 

The case expression need not necessarily be a variable. When a constant 
is used in a case expression, the value of the constant expression will be 
compared against each of the case item expressions. This is called a reversed 
case statement. This coding style fits the one-hot state machine scenario very 
well. 

In the following code, a one-hot state-machine is illustrated, using 
reversed case statement. Since the case statement expression will cause entry 
into the case statements for any value, the first case item that matches will 
cause the exit from the case statement. 

module one_hot (elk, rst_n, rd_n, ready, done, 
outO, outl, out2, out 3) ; 

input elk, rst_n, rd_n, ready, done; 
output outO, outl, out2 , out 3 ; 

parameter drive_outO = 4'bOOOl; 
parameter drive_outl = 4'b0010; 
parameter drive_out2 = 4'b0100; 
parameter drive_out3 = 4'bl000; 

reg [3:0] current_state, next_state; 

// the sequential process 
always_ff @(posedge elk or negedge rst_n) 
if (rst_n == 1'bO) 

current_state <= drive_out0; 
else 

current_state <= next_state; 

// The combinatorial process 
always_comb 
begin 

next_state = current_state ; 
case (l'bl) 

current_state [0] : // drive_out0 
if (~rd_n) 

next state = drive outl; 
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else 

next_state = drive_out2; 
current_state [1] : // drive_outl 
if ( ! ready) 

next_state = drive_out3 ; 
else if (done) 

next_state = drive_outO; 
current_state [2] : // drive_out2 
if ( ! ready) 

next_state = drive_out3 ; 
else if (done) 

next_state = drive_outO; 
current_state [3] : // drive_out3 
if (ready & ~rd_n) 

next_state = drive_outl; 
else if (ready & rd_n) 
next_state = drive_out2 ; 
default: next_state = drive_outO; 
endcase 
end 



assign outO 
assign outl 
assign out2 
assign out3 



current_state [0] ; 
current_state [1] ; 
current_state [2] ; 
current state [3] ; 



//no operation 
// read operation 
// write opeartion 
// waiting for ready 



endmodule 



2.6 Memories 

Memories form an important part of the chip design. The memories can 
be small enough to form a simple register array or as a cache. The presence 
of memories is increasing in the chips as the size of the area grows. This 
section discusses the implications of inferring multi-dimensional arrays as 
memories in the designs and a few considerations in choosing the memories 
from technology vendors. 

2.6.1 Illustrate how a multi-dimensional array is implemented. 

Static memories can be synthesized by the synthesis tools implementing 
the storage element inferred within the array construct. The following is an 




76 



RTL Design 



example code for synthesizing small synchronous static memories that can 
be used like a simple register file within the larger design. 

module my_memory (datai, datao, elk, wr_n, addr); 

parameter width = 4; 
parameter log2_depth = 16; 

input [width - 1 : 0 ] datai, addr; 
input elk, wr_n, rd_n; 
output [width -1:0] datao; 

reg [width -1:0] memory [log2_depth -1 : 0] ; 
reg [width -1:0] datao; 

always @(posedge elk) begin 
if (wr_n == 1'bO) 

memory [addr] <= datai; 
else if (rd_n == 0) 

datao <= memory [addr] ; // Synchronous read 

end 

/ / Combinatorial read 
// assign datao = memory [addr] ; 

endmodule / / my_memory 

The above code effectively synthesizes 64 FFs whose inputs and outputs 
will be tapped based on the address values. 

Verilog-2001 has introduced multi-dimensional memories. The same 
example above can be extended for three dimensions of the memory, that is, 
x, y and z, as follows: 

module my_memory (datai, datao, elk, wr_n, 
addr_x, addr_y, addr_z) ; 

parameter width = 4; 
parameter log2_d = 4 ; 

input [width -1 :0] datai, addr_x, addr_y, addr_z ; 
input elk, wr_n, rd_n; 
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output [width -1 : 0 ] datao; 



reg [width -1 


0] memory [log2_d -1 


0] 


// 


addr_x 




[log2_d -1 


03 


// 


addr_y 




[log2_d -1 


01 ; 


// 


addr_z 



reg [width -1 : 0] datao; 



always @(posedge elk) begin 
if (wr_n == 1'bO) 

memory [addr_x] [addr_y] [addr_z] <= datai; 
else if (rd_n == 1'bO) 

/ / synchronous datao 

datao <= memory [addr_x] [addr_y] [addr_z] ; 

end 

/ / combinatorial datao 

// assign datao = memory [addr_x] [addr_y] [addr_z] ; 
endmodule / / my_memory 

The multi-dimensional arrays above would eventually get synthesized 
into (x*y*z*width) = (4*4*4*4) = 256 individual FFs. Placing the 
appropriate multiplexes from the Q output of these FFs and gating logics for 
the D inputs decide the data in and data out. 

Using a hardmacro of memory from a semiconductor vendor has better 
timing, area, and power, as its logic is optimally placed, rather than 
synthesizing it using discrete logic. 

Note that instantiating a technology specific memory will make the 
design non-reusable with a different technology. One of the 
recommendations in this inevitable situation is to bring the pins of the 
memory all the way to the top level of the module, and instantiate the design 
and the memory in a wrapper, and not within the core of the design. Since 
most vendors have similar pin-outs of memory design, the user can also have 
a choice to instantiate the memory from any vendor in the wrapper. The 
wrapper can then be instantiated in the top-level netlist. 

Check with your semiconductor vendor for the availability of the type of 
memory that you are interfacing into your system design. 
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2.6.2 What are the considerations in instantiating technology- 
specific memories? 

Instantiating technology specific memories are required in many 
applications. Depending upon the application, the choice of memory is based 
on the following performance variables : 

• Area: If the area is the prime concern on the die, then a high-density 
memory is required. This is typically targeted for high volume 
applications or chips with large on chip memory blocks. The overall 
area will also depend upon the process technology of the memory 
block. 

• Frequency: If the speed is the prime concern, then high-speed 
memories are required which operate at high frequencies. Note that 
these memories could potentially be larger in area. 

• Power: This is one of the critical concerns for low voltage and low 
power applications of chips in cellular phones, hand held devices, 
etc. Also, if the power dissipation becomes high, then the operating 
conditions begin to be de-rated, to the extent that the performance of 
the overall system becomes lower. It also increases the cost of final 
packaging of the chip for dissipation purposes. Note that power 
dissipation is tightly coupled with the frequency at which the 
memory will be used. 

The other design variables in considering the memories are: 

• Capacity: The capacity of the memory is typically specified in 
the resolution of bits. For example, a memory is specified as 
512Kbits. 

• Voltage: Since some memories are designed for specific voltage 
ranges, it is important to pick the memory meeting the desired 
voltage ranges. 

• Synchronous or asynchronous: This variable specifies whether 
the memory will have a synchronous read/write or an 
asynchronous read/write. Which one is to be used primarily 
depends upon the presence of a clock element, and the matching 
of timing requirements of the memory and the design. 

• Single port or multi port: This variable determines whether the 
storage within the memory is accessed by a single read/write port 
or multiple ports. One of the critical issues during the use of a 
multi-port memory is the resolution on what happens when 




RTL Design 



79 



multiple ports are trying to do a write to the same memory 
location. 

• Flip-flop or latch based: This variable determines if the storage 
element within the memory is based on a flip-flop or the latch. 
The important considerations for this memory are the testability 
and power. Note that a FF based design is more testable than a 
latch based design. 

• Scannable or not: With the size of the memory increasing 
nowadays, the scannability of the memory is an important 
criteria. Many manufacturers and vendors are providing the BIST 
logic for making the memory scannable. 

Just like any other electronic component, the following manufacturing 
variables also need to be considered in choice of memories for a mass 
production application: 

• Unit cost: This variable will eventually drive the overall cost of 
the chip, board, and the system itself. It matters a lot in a mass 
production scenario. 

• Availability: Availability of the memories will impact the time 
to market for the end-product success. 

• Failure rate: The yield of the memory must be high, and the 
failure rate must be low. BIST circuits will be required to be 
added within the chip, along with the memories to test them. 

The choice of memory will depend upon what the end application is, and 
hence requires a good balance in all the above considerations. 

2.6.3 What are the factors that dictate the choice between 
synchronous and asynchronous memories? 

Synchronous memories, as the name suggests, have a clock as one of the 
primary inputs. All the writes happen, based on the rising or falling edge of 
this clock, when the data meets the setup time requirements. All reads 
happen from the Q output of the flops, after the data ready time. 

Asynchronous memories don’t have a clock interface. The data writes 
and reads typically happen with an enable pin. 

The main differences between the two memories are as follows: 
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Table 2-10. Difference between sync 


ironous and asynchronous memories 


Synchronous memories 


Asynchronous memories 


Data writes and reads based on a 
clock port 


Data writes and reads typically based 
on an enable pin 


Has better static timing, because the 
data output from a synchronous 
memory is registered 


The data output from an 
asynchronous memory is a 
combinatorial lookup of its address 
inputs; Therefore, this combinatorial 
logic could potentially become a 
critical element in the timing path 


Has larger area compared to 
asynchronous memory 


Area is less compared to 
synchronous memory 


Read operations are generally two 
clock cycles, minimum: the first 
cycle is usually used by the memory 
to sample the address, and the 
second cycle will be used by the 
external system to sample the read- 
data 


Both read and write cycles are 
asynchronous, based on “enable” 
pins 



2.7 General Design Considerations 

This section briefly discusses the general design considerations like 
reusability and other factors that need to be considered early in the design 
cycle. Reusability of a design is not something that should be deferred until 
the end of an implementation. This needs to be considered early and all the 
way during the implementation of the design. 

2.7.1 What are some reusable coding practices for RTL Design? 

A reusable design mainly helps in reducing the design time of larger 
implementations using IPs. The topic of reusability has been very well 
discussed in the Reuse Methodology Manual (see References at the end of 
this book for details of the book). The following key points summarize the 
main considerations during the implementation phase: 

• Register all the outputs of crucial design blocks. This will make the 
timing interface easy during system level integration 

• If an IP is being developed in both Verilog and VHDL, try to use the 
constructs that will be translatable later into VHDL. 

• Avoid snake paths, as it will make both debugging tedious and 
synthesis inefficient. 
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• Partition the design considering the clock domains and the functional 
goals. 

• Follow lexical and naming conventions that are self-descriptive and 
facilitate future product maintenance. 

• Avoid instantiation of technology specific gates 

• Use parameters instead of hard-coded values in the design 

• Avoid clocks and resets that are generated internal to the design 

• Avoid glue logic during top level inter-module instantiations 

2.7.2 What are “snake” paths, and why should they be avoided? 

A snake path, as the name suggests is a path that traverses through a 
number of hierarchies, and may eventually return back to the same hierarchy 
from which it originated. 

Snake paths must be avoided in a design for the following reasons: 

• It will constitute a long timing path, and hence, be the surprise 
critical path when static timing analysis is done at the top level. It 
may not show up during the timing analysis of the unit level blocks if 
it is poorly constrained. 

• The synthesis tools need to put more effort in characterizing the 
constraints of the path across the hierarchies, and the compile time 
can get higher. 

Some tips that can be followed to avoid the snake paths are: 

• Register the outputs of modules with different functional objectives. 

• Partition the design functionally, to avoid long paths across different 

hierarchies. 

Keep checking for the presence of the snake paths by periodically 
running synthesis on the fully integrated RTL, even if it is not fully verified 
functionally. This will give early feedback through the timing reports for the 
presence of a path traversing across multiple hierarchies. 

2.7.3 What are a few considerations while partitioning large 
designs? 

A large design needs to be approached in a hierarchical fashion. The 
following considerations need to be taken while partitioning these designs: 
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• Functionality: The functional grouping of the logic within a hierarchy 
is the prime criteria during partitioning the design. Typical partitioning 
of hierarchies are: 

o Address and data paths: This module typically contains the 
address and data path registers, which drive the address and data 
buses of the primary outputs. 

o Control logic: This module typically contains Finite State 
Machines (FSMs), and the module gets the inputs for the FSMs, 
whose outputs drive the controls for the rest of the logic. 

• Clock domains: In a multiple clock design, it is recommended to group 
the logic connected in the same clock domain in a single module. When 
signals need to interact with another module with a different clock, it is 
recommended to go through a synchronizer module, which takes in the 
input from the source clock domain and synchronizes it to the clock 
domain of the destination module. 

• Area: Having too little logic in a module will create too many 
hierarchies, and too much logic within a single module will create issue 
of not being able to do fine tune control during floorplanning later 
during the backend process. 

Verilog doesn’t constitute any limit on the number of hierarchies, but it is 
a good practice to not have too many (lots of leaf level hierarchies of FFs) or 
too few (just one huge module!) hierarchies. 

2.8 Multiple clock Design Considerations 

While each module works well at its unit levels, it is important to 
consider the perspective of reliability when the signals from the design unit 
communicate to/from the signals of the other design units. The approach to a 
synchronous design is quite helpful, but the presence of multiple clock 
domains in a circuit is getting common. The reliability becomes especially 
challenging when the signals are communicated across clock domains. This 
section discusses a few issues to be considered when signals cross the clock 
domains, and how the reliability can be improved. 

2.8.1 How can I reliably convey control information across clock 
domain s ? 

When control signals are traversing across clock domains, the signal 
appears as an asynchronous input at the destination clock domain. Hence, 
this signal needs to be synchronized to meet the setup and hold requirements 
of the destination clock domain, so that the downstream logic can have valid 
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logic levels. Otherwise, the FF will enter into meta-stable state, in which 
case it will not be able to arrive at a valid state in a given amount of time. 
The output of a meta-stable FF can be at an intermediate voltage level, or 
may oscillate invalidating logic down the signal path. 

One of the common methods is to have a two-stage synchronizer FFs 
between the source and destination clocks. If the first FF enters into a meta- 
stable state, due to any race condition between the elk and D inputs, then the 
Q value captured in the first flip-flop is an unknown, that is, either a 1 or a 0 
(“x” in simulation), depending upon the resolution of the changes in the 
inputs. 

By having two flip-flops in series, the second flip-flop is always sure to 
capture the resolved state of the first flip-flop as a stable data, even if the 
first one is meta-stable for a time after the rising edge of the clock. 

The following is a 2-stage synchronizer. Note that the data is coming 
from source clkl while the two FFs are driven by clk2 . 



Data to be _ 
Synchronized 
driven by 
source clock 



D Q 




D Q 




elk 




elk 






> 



Data synchro- 
nized to 

destination clock 
domain 

Destination 

clock 



Figure 2-12. 2 FF synchronizer 



Some chip and IP vendors even have a special optimised cell just for the 
synchronization purpose. Although these cells have lesser setup and hold 
time requirements, these cells may be larger in area than normal FFs and 
also consume more power. Note that instantiating such technology specific 
cells could make the design non-reusable with a different library vendor. In 
such a case, it is recommended to have a module defined with the two 
synchronizing flip-flops and instantiate them in the design. 

The above synchronizer only takes care of the level signals long enough 
to be sampled by the next rising edge of the destination clock. In the case of 
a pulsed signal transmission, with widths that could be less than the 
destination clock frequency, the above synchronizer logic is not helpful. The 






84 



RTL Design 



readers are encouraged to read about good design implementation in the 
following reference, titled, “Crossing the abyss: asynchronous signals in 
synchronous world”, that can be found in the following URL 
http://www.reed- 

electronics.com/ednmag/ article/C A3 1038 8 ?pubdate=7 %2F24%2F2003 

2.8.2 What is a safe strategy to transfer data of different bus- 
widths and across different clock domains? 



When data is to be transferred across different bus width and different 
clock domains, a FIFO (First In First Out) is an ideal component. If the bus 
width between the write (the side which pushes the data into the FIFO) and 
read (the side which pops the data from the FIFO) sides are different, then it 
becomes an asymmetrical FIFO. Many IP and chip vendors have 
asymmetrical and dual clock FIFOs in their libraries. An entity diagram of a 
typical asymmetrical and dual clock FIFO is shown in the following figure: 
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Figure 2-13. Assymmetrical width FIFO 

The flags in the FIFO above are typically the full, empty, almost-full and 
almost-empty. The thresholds for these FIFOs can either be set as an input 
signal or as an instantiating parameter. The widths of the wr_data and 
rd_data busses are different, but are usually related by an integral 
multiple (that is, one width is an integral multiple of the other). 

2.8.3 What are a few considerations while using FIFOs for posted 
writes or prefetched reads that influence the speed of the 
design? 



FIFOs are typically used in numerous data transfer applications for 
performance and sustenance reasons. One of the main applications of FIFOs 
is to post write transactions and to prefetch the data reads. 
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The advantages of using the FIFOs for posted writes or prefetched reads 
are: 

• FIFOs in general help as a temporary storage buffer, which stores the 
data written from the write path until it is popped out by the consumer. 
Thus, in an application like a bridge across two protocol buses with 
different frequencies, FIFOs help in completing the bus cycles of a faster 
host sooner. This allows other masters in the host bus to use the bus 
more efficiently. 

• The performance of write data transfer from a bridge that is faster is a lot 
better when it stores the data in the FIFO, as it doesn’t have to be held 
up by a slower slave through wait states during the individual beats of 
the data transfer. 

The disadvantage of using the FIFOs for posted writes or prefetched 
reads are: 

• Suppose the originating master posts the data into the FIFO, and 
assumes the data transfer to have happened to the destination slave, and 
the slave now issues an ERROR. It has to be communicated back to the 
master, since it assumes the data transfer to have taken place. Typically, 
in SoC environments, it is taken care of by issuing a high priority 
interrupt to the host or the originating master. 

• If the originating master aborts a read transaction late in the cycle when 
the read prefetch has already taken place, there is a possibility of a stale 
data remaining in the read FIFO. When such a condition occurs, the read 
FIFO may need to be flushed before a new read transaction. 

• In order to ensure data coherency between a read followed by write 
situation, all reads to the same slave address space must be blocked until 
the previous write transaction is completed. This is typically monitored 
by watching the empty signal of the FIFO. 

In general, FIFOs are very useful to reduce bus latencies and functionally 
necessary when the bus widths are asymmetrical. 

2.9 Common “Gotchas” in Synthesizable RTL 

This section explains how and why certain unintentional “gotchas” occur 
after coding. 
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2.9.1 What will be synthesized of a module with only inputs and no 
outputs? 

A module with only inputs and no outputs will synthesize into a module 
with no logic, since there is nothing to be synthesized as an output. 

2.9.2 Why do I see latches in my synthesized logic? 

There is more than one reason why latches could be seen in synthesized 
logic. This information is typically present in the elaboration log file of the 
synthesis tool. 

• The if-else clause in the always block to which the latch is 
associated doesn’t have a final else clause. 

• The reg declaration of the variable doesn’t have any value 
assigned upon entry to the combinatorial always block if the 
variable is used in an i/statement without the else clause. 

• There could be no default clause of a case construct that is not 
complete or the variables assigned within the case were not 
assigned a default value before entering the case statement. 

2.9.3 What are “combinatorial timing loops”? Why should they be 
avoided? 

Combinatorial timing loops are hardware loops in which the output of 
either a gate or a long combinatorial path is fed back as an input to the same 
gate or to another gate earlier in the combinatorial path. These paths are 
generally created unintentionally when a variable from one combinatorial 
block is used to drive a signal that is used in the same combinatorial block 
from which the variable was derived. This typically happens in large size 
combinatorial blocks, wherein it is difficult to visually track that a loop is 
getting created. 

These combinatorial feedback loops are undesirable for the following 
reasons: 

• Since there is no clock edge in between to break the path, the 
combinatorial loops will infinitely keep oscillating and triggering a 
square waveform, whose duty cycle is dependent upon the sum of 
ON delays and OFF delays across the combinatorial path. For 
example, the following code is a combinatorial loop: 

assign outl = outl & ini; 
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This will cause the outl to feed in combinatorially back as one of 

the inputs. 

• These loops cause a problem in testability, since they can inhibit the 
propagation of the logic forward. 

Combinatorial loops can be caught quite early by one of the following 
means: 

• Periodic use of linting tools throughout the development process. 
This is by far the best and easiest way to catch and fix loops early in 
the design cycle. 

• During functional simulation, the desired output behavior doesn’t 
appear in the output, or the simulation doesn’t proceed ahead at all, 
because the simulator is hung. 

• If the loop is undetected during simulation, many synthesis tools have 
suitable reporting commands, which detect the presence of a loop. 
Note that synthesis tools proceed with the static timing analysis by 
breaking the timing arc of the loop for critical path analysis. 

2.9.4 How does the sensitivity list of a combinatorial always block 
affect pre- and post- synthesis simulation? Is this still an issue 
lately? 

With Verilog-1995, between synthesis and simulation, it is important to 
have all elements that are in the RHS of the statements, or used within 
conditional statements, to be part of the sensitivity list of a combinatorial 
always block. 

While the synthesis tools go ahead and make use of the nets that are not 
in the sensitivity list, simulation will ignore change on those nets during 
logic evaluation. As a result, the behavior seen during functional simulation 
and post synthesis is different. 

Typically, text editors like emacs with a Verilog language mode have 
been able to automatically infer the right nets, and automatically add it into 
the sensitivity list. Linting tools will provide error messages during parsing 
of the RTL code. 

From Verilog-2001 onwards, this is not an issue anymore. The language 
now has an implicit event_expression list, which adds all nets and variables 
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read by procedural and timing control statements into the sensitivity list. The 
event_expression is indicated by @(*). For example, in the following 
combinatorial block, all elements of the RHS are in the sensitivity list, as 
required by Verilog-1995. 

always @(inl or in2 or in3 or in4) 
begin 

outl = (ini A in2) & (in3 | in4) ; 
end 

The same in Verilog-2001 can be written in two ways, as: 

// note the use of in the place of "or" 

always @(inl, in2, in3 , in4) 

begin 

outl = (ini A in2) & (i.n3 | in4) ; 

end 

or 

always @(*) // note the use of "*" 
begin 

outl = (ini A in2) & (in3 j in4) ; 

end 

The same code in SystemVerilog can be represented using the 
always_comb procedure, as follows: 

always_comb 

begin 

outl = (ini A in2) & (in3 | in4 ) ; 

end 

Note that the code is now simpler, relinquishing the user from keeping 
track of the sensitivity list. It is much more maintainable and readable, too. 
The key advantage of using the always_comb procedure over the implicit 
sensitivity list of @(*) is that the former is executed right from time 0 like an 
assign statement, whereas the latter waits for an event to trigger its 
activation. The simulation and synthesis tools figure out the elements of 
sensitivity list automatically. Check with your simulation and synthesis tool 
vendor for the support of SystemVerilog and this construct. 
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2.10 Coding techniques for Area Minimization 



This section describes some of the techniques using RTL coding and 
parameterized approach for providing optimal area requirement for a soft IP. 
Rather than have the entire logic, which may not be useful for all the 
different users, the area optimization of unwanted logic will be useful in 
large SoCs. Removing unwanted area not only reduces silicon area, but also 
reduces switching activity, and, hence, the power, too. 

2.10.1 How do the 'ifdef, ' ifndef \ ' elsif 'endif constructs aid in 
minimiz ing area? 

The proper use of compiler directives like ifdef, ' ifndef ’ etc. can help in 
minimizing the area during post elaboration and during logic optimization. 
Since the use of compiler directives is a compile time operation, it is a static 
decision for the session of simulation and during synthesis. 

The following is an example of how the compiler directives can be used 
for minimizing the area of a logic design. 

'define MIN 

module area_min_byif def (ini, in2 , in3 , in4 , outl); 

input ini , in2 , in3 , in4 ; 
output outl; 

'ifdef MIN 

assign outl = ini & in2 ; // minimal area 
// more related logic to MIN 
'else // larger area 

assign outl = (ini & in2) | (in3 A in4) ; 

// more related logic to not -MIN 
'endif 

endmodule 

Note that the use of compiler directives is legal to pick instantiations of 
modules itself, and, hence, can be helpful to pick a module with appropriate 
area size. For example, in the following code, the compiler directive is used 
to pick the correct type of counter, that is, ripple counter or carry lookahead, 
counter depending upon the directive. 
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// 'define CLA 

module area_min_byifdef (ini, in2 , cin, sum, cout); 

input ini, in2 , cin; 
output sum, cout ; 

'ifndef CLA 

ripple_adder U_ripple ( 

.ini (a), . in2(b), .in3(c), 

.sum (sum) , .cout (cout) 

); // smaller area, longer timing 
'else 

cla_adder U_cla ( 

.ini (a), . in2(b), ,in3(c), 

.sum (sum), .cout (cout) 

) ; // larger area, faster timing 
'endif 

endmodule 

In the above example, the 'ifndef was used to illustrate the absence of a 
define for CLA. Note that the 'define for CLA has been commented out. If 
it gets uncommented, then the carry lookahead adder instantiation gets 
selected. 

Hence, in this approach, the selection of the appropriate “section” of code 
during parsing and elaboration decides the final area of implementation. 

2.10.2 What is “constant propagation”? How can I use constant 
propagation to minimize area? 

Constant propagation is a very effective technique for area minimization, 
since it forces the synthesis tools to optimize the logic in both forward and 
backward directions. Since the area minimization is achieved using 
constants, this technique is called constant propagation. An example of 
constant propagation is shown below: 

module const_prop (ini, in2 , outl, out 2) ; 

input ini , in2 ; 
output outl, out 2; 
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parameter create_logic = 0; 

assign outl = (create_logic == 1) ? 

ini & in2 : 1'bO; 

assign out2 = (create_logic == 1) ? 

ini | in2 : 1'bO; 



endmodule 

Note that create logic is a parameter within the module, that controls the 
logic backwards from both the outputs outl and out 2. It could also 
control the logic forward from the inputs ini and in2 by adding internal 
wires to either select the direct input ini or the 1'bO. An example of how 
the forward constant propagation works is as follows: 

wire int_inl, int_in2; 

assign intinl = (create_logic == 1) ? ini : 1'bO; 
assign int_in2 = (create_logic == 1) ? in2 : 1'bO; 

assign outl = int_inl & int_in2 ; 
assign out2 = int_inl | int_in2 ; 

When this parameter is 0, it forces the logic zero in the assign statements, 
it results in logic zero propagation in either direction. As a result, no logic 
gets enabled and the logic is optimized in synthesis. When this parameter is 
1, the logic is synthesized. 

Note that different techniques to override the parameter will also work, 
that is, the constant propagation will be effective, even with parameter 
override. 

Hence, the default value of the parameter can be set to 1, and be 
overridden to 0, by different parameter overriding techniques, when required 
to minimize the area. 

SystemVerilog has also introduced a new construct const which declares 
a variable as a constant. The const construct can be used to enforce constant 
propagation, just like other constants like parameter. For example, the same 
example above can be applicable by replacing the parameter with const, as 
follows: 
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module test_const (ini, :Ln2 , outl, out2); 

input ini, in2 ; 
output outl, out 2 ; 

const bit create_logic = 1; 

assign outl = (create_logic == 1) ? 

ini & in2 : 1'bO; 

assign out2 = (create_logic == 1) ? 

ini | in2 : 1'bO; 



endmodule 

The output with the const construct above is exactly the same as when a 
parameter is used. 

2.10.3 What happens to the bits of a reg which are declared, but not 
assigned or used? 

When any of the bits of a reg declaration is unused, the logic 
corresponding to those bits gets optimized away. For example, in the 
following code, the bits 2:1 are unused, although the int_tmp is declared 
to be [3:0]. This code will synthesize the logic for bits [3] and [0], and no 
logic for bits [2:1], 

module lower (ini, elk, reset, outl); 
input ini , in2 ; 
input elk, reset; 
output [1:0] outl; 

reg [3:0] int_tmp; 

always @(posedge elk or negedge reset) 
begin 

if ( ! reset) 
int_tmp <= 0; 
else 

// Only bits 0 and 3 are used. 

// Bits [2:1] are not assigned 
int_tmp[0] <= ini; 




RTL Design 



93 



int_tmp[3] <= in2 ; 

end 

assign outl = int_tmp; 
endmodule 

2.10.4 How does the generate construct help in optimal area? 

Verilog-2001 generate can be useful in area optimisation techniques. 
This construct must be coded within a module scope. Unlike the define 
based approach, this construct allows the use of a variable, declared using 
the genvar construct, to control the logic generated. 

The generate construct can be used in two ways: either with a for loop 
within the generate-endgenerate scope, or using the conditional if-else 
construct, or the conditional case construct within the endgenerate. The 
“generate for” usage helps in precisely instantiating the right amount of logic 
in a reusable design. The “generate if ’’usage determines whether the logic 
should get generated at all. The amount of logic is precisely controlled using 
the construct and its variable. 

This is best illustrated using examples, as follows. The first is the use of 
an if-else clause within the generate-endgenerate constructs. The analogy is 
very similar to the use of ifdef as discussed earlier in FAQ 2.10.1. Note that 
the genvar construct is not required in this case. 

module if_generate (ini, in2, outl); // 1.12.5 
parameter xor_logic = 1 ; 

input ini , in2 ; 
output outl; 

wire outl; 
generate 

if (xor_logic == 1) 
outl = ini * in2; 
else 

outl = ini & in2; 
endgenerate 



endmodule 
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In the above example, depending upon the resolution of the value of the 
parameter xor_logic, either the XOR gate or the AND gate gets 
generated. Although this is a very simple use of this construct, the same 
analogy can be extended for multiple statements through the use ofbegin- 
end statements. The parameter can be overridden, using defparam construct, 
too. Note that even instantiations can be controlled in the if-or else clause. 

The second way to use the generate construct is with for loops. The 
variable used in the for loop has to be a genvar declaration. The variable is 
then used in a for loop which is instantiating the exact number of modules 
required. 

module andit (ini, in2 , outl) ; 
input ini, in2 ; 
output outl; 
wire outl; 

assign outl = ini & in2 ; 
endmodule 

module forgen_test (ini, in2 , outl); 
parameter width = 4 ; 
input [width- 1 : 0] ini, in2 ; 
output [width- 1 : 0] outl; 

wire [width- 1 : 0] outl; 

genvar i; // variable for the for loop 

generate for (i=0; i < width; i = i+1) 
begin : AND_BLOCK 

andit U1 (ini [i] , in2 [i] , outl [i] ) ; 
end endgenerate 

endmodule 

The “AND_BL0CK” block identifier is required for any heirarchical 
names generated by the concatenation of the block identifier and the variable 
value as {generate_block_identifier, genvar_value}. In this case, the 
hierarchical names generated were: 
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AND_BLOCK[0].U1 // Lowest index of the for loop 
AND_BLOCK[1].U1 

AND_BLOCK[3].U1 // Highest index of the for loop 

These generated names can be used in hierarchical path names, just as in 
a hierarchical design. The above example saves a lot of code for explicit 
instantiations, especially if the variable size is large. Also, it allows good 
control on how many of these need to be instantiated by the parameter width. 
Thus, precise area control can be achieved. Note that this simple anding 
module can be extended for more complex hierarchies, too. 

The third way to use the generate construct is through the case statement 
within the generate-endgenerate scope. This allows selective branching to 
take place through the case statement, and, hence, controlling which of the 
sections of the code to be finally ‘generated’ . An example of the conditional 
selection of a module instantiation through a case statement is as follows: 

module anding (ini, in2, outl) ; 
input ini , in2 ; 
output outl; 
wire outl; 

assign outl = ini & in2 ; 
endmodule 

module oring (ini, in2 , outl); 
input ini , in2 ; 
output outl; 
wire outl; 

assign outl = ini | in2 ; 
endmodule 

module xoring (ini, in2 , outl); 
input ini , in2 ; 
output outl; 
wire outl; 

assign outl = ini A in2 ; 
endmodule 
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module casegen_test (ini, in2, outl); 

input ini , in2 ; 
output outl; 
wire outl; 

parameter operation = 0; 

generate 

case (operation) 

0 : anding U1 (ini, in2 , outl); 

1 : oring U1 (ini, in2, outl); 

2 : xoring U1 (ini, in2 , outl); 

default : anding U1 (ini, in2, outl); 

endcase 

endgenerate 

endmodule 

A few important points of the above example are: 

• The case condition “operation” has to be a constant, or a genvar 
variable in order to make a definitive decision during the conditional 
instantiation. Otherwise, it is a syntax error, since the tools will 
encounter this as an unknown value during elaboration. Hence, the 
approach is useful in parameterised designs. 

• Depending upon the value of operation above, the output either gets 
anded, ored, or xored. But, eventually only one of them will happen. 
This is useful in scenarios where a selective implementation needs to 
be instantiated, and the instantiation of that module can be selectively 
controlled. 

2.10.5 What is the difference between using 'ifdef and generate for 
the purpose of area minimization? 

As discussed in the earlier questions, both ifdef and generate constructs 
can be used for the purpose of area minimization. The difference between 
the two in using these constructs for area minimization is summarized in the 
following table: 
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Table 2-11. Difference between using 'ifdef and generate to minimize area 



ifdef 


generate 


Construct can be used inside and 
outside the scope of module 
definition 


Construct has to be used only within 
the scope of module definition 


Construct works on the boolean 
presence or absence of a define of 
the 7/i/e/' variable 


Construct can use the value of a 
variable using the genvar construct 
when used in a for or case constructs 


Useful only in equivalence to the if- 
else construct, and cannot perform 
any looping or branching operations 


The genvar variable can be used in a 
for loop or case branch, to allow 
multiple or selective instantiation of 
variables and modules 



2.10.6 Can the generate construct be nested? 

No. The generate construct cannot be nested. It is a syntax error to try to 
nest the generate-endgenerate construct. 

However, the if, case, and for constructs within the generate- 
endgenerate can be nested. The constructs can also be used within one 
another, too, that is, i/within case, for within if etc. 

You can also use multiple non-nested generate-endgenerate constructs 
within the module. 

2.11 Coding for Better Static Timing Optimization 

When a design gets compiled into a netlist, the various elements of the 
delays in the path, like the cell delay, routing delay, etc., contribute in 
deciding the overall performance of the chip. The timing impact of the 
design should be factored very early in the design process, during functional 
partitioning and coding of the design. It will be too late to consider timing 
impacts later in the functional verification cycle. 

This section discusses a few topics on the different factors impacting the 
static timing of the design. 

2.11.1 What is a critical path in a design? What is the importance of 
understanding the critical path? 



A critical path is the path through a circuit that has the least slack. It is 
not necessarily the longest path in the design. There can be more than one 
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critical path in a design. In fact, all paths whose difference between the 
arrival time and required time at the endpoint is negative (, that is, negative 
slack) is a violating path. 

Understanding and identifying the critical path in a design is important 
for the following reasons: 

• It helps fix the static timing problems, especially when the endpoint is a 
D input to a flip-flop, and the critical path delay is violating the setup 
time requirement for the flop. 

• Shortening the critical path delay obviously improves frequency and, 
hence, the performance of the logic. 

If the critical path is identified early in the design flow, then appropriate 
functional changes can be done early on in the project to terminate the path 
to the D input of a flop at an appropriate point in the path. This point has to 
be carefully chosen, considering the side effects in latency and static timing 
that would arise due to the staging of the path through a flop. 

If the source of the critical path is from a primary input, it is 
recommended to register the input. Although this could add to the latency, 
this strategy will eventually help in improving the frequency of operation. 

2.11.2 How does proper partitioning of design help in achieving 
static timing? 

Partitioning a design correctly helps in multiple stages of the design, all 
the way until the backend flow. The best approach for partitioning is to plan 
the partitioning of the design before writing HDL code. It is important to 
keep these considerations early on, to avoid hierarchical, port, or logic 
changes late in the design. The following are some of the criteria for design 
partitioning: 
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1. Logical partitioning: The partitioning of the modules with close 
logical associativity is a very common approach. This way, it is both 
easy to debug and modular in approach. Typically, the partitioning 
logic boundaries are datapaths (register’s and glue), control (FSMs), 
memories and I/O. Logistically, it also helps with having multiple 
team members do thorough unit level verification, and it helps with 
better design management and version control. All combinatorial 
logic associated with the same clock domain should also be closely 
within the same module. Inter-module partitions can restrict logic 
optimization by synthesis tools. Hierarchical boundaries prevent any 
combining of related logic. Typically, a module size should be 
around 5K gates. 

For synthesis tools to consider resource sharing and freedom to 
optimize, all relevant resources need to be within one level of 
hierarchy. If the resources are not within one level of hierarchy, 
synthesis tools cannot make tradeoffs to determine whether or not 
the resources should be shared. 

It is during the logical partitioning that the designer has the freedom 
to decide upon the registering of the outputs between critical inter- 
module hierarchies. This will immensely reduce the possibilities of 
long combinatorial paths and combinatorial snake paths in the 
design, and, hence, better static timing implication. 

Special function logic, like the pads, I/O drivers, clock generation 
and boundary scan should be at separate logical hierarchies. 

Any on-chip memories, like SRAMs or DRAMs should be placed at 
the top level. This will make the physical design interaction and 
floorplanning tasks more effective by better timing analysis. 

2. Goal based partitioning: Partitioning based on different design goals 
of speed and area will eventually help the tools do a good job. 
Modules with different goals can be specified with their respective 
constraints during the synthesis for the tools to do a good job. 

3. Clock domain partitioning: Partitioning the logic according to same 
clock domain plays an important role in synthesis, static timing 
analysis, and scan insertion. The inter-clock false paths can be 
defined within a single synchronizer module, and the entire module 
is now with a single clock domain. 

4. Reset based partitioning: If a particular - SoC has multiple resets, then 
it is a good idea to consider reset based partitioning, too. This helps 
all the storage elements within the module to wake up gracefully at 
the same time. 
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2.11.3 What does it mean to “retime” logic between registers? How 
does it effect functionality? 

Retiming is the process of relocating registers across logic gates, without 
affecting the underlying combinatorial logic structure. This process is 
achieved by borrowing logic from one time frame and lending it to the other, 
while maintaining the design behavior. When you have a pipelined design, 
for example, in a datapath of a design, then retiming is a technique for 
reducing the critical path within the pipeline. 

Retiming has benefits as follows helps in balancing the paths between the 
pipeline stages 

Retiming also has potential restrictions as follows: 

• Note that, although retiming can be used to reduce the critical path 
between the pipeline registers, it cannot be used to reduce the latency of 
the design. 

• A retimed design may not be formally equivalent to the original design. 

2.11.4 Why is one-hot encoding preferred for FSMs designed for 
high-speed designs? 

Since there is one explicit FF per stage of a one-hot encoded state 
machine, there is no need of output state decoding. Hence, the only 
anticipated delay is the clock to q delay of the FF. This makes the one-hot 
encoding mechanism preferable for high-speed operation. 

2.12 Design for Testability (DFT) considerations 

Design for Testability or DFT techniques are design efforts that need to 
be considered upfront during the design phase, to ensure that the design 
under test is eventually testable. This process could increase the area in the 
expense of increasing the fault coverage. By proper DFT considerations 
upfront, the test generation/development time and the time with the tester 
can be reduced. While there could be a few pins that get increased for better 
fault coverage, it provides better observability and controllability, which are 
the key considerations for good testability. 

The following FAQs discuss a few factors that can effect the testability 
and fault coverage of a design, and what the DFT techniques are. 
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2.12.1 What are the main factors that affect testability of a design? 

The following are some of the main factors affecting the testability and 
fault coverage of a design: 

• Presence of tri-state buses in the design 

• Reset of a FF driven by the output of another FF 

• Presence of derived clocks in the design 

• Presence of gated clocks in the design 

• Presence of latches in the design 

Each of the above issues are discussed in the following FAQs. 

2.12.2 My chip has on-chip tri-state buses. What are the testability 
implications, and how do I take care of it? 

Normally, tri-state buses shouldn’t be present within the chip, as they 
consume more power. However, if the tri-state buses are present inside a 
chip, care should be taken to avoid bus contention, that is, driving different 
values at the same time. It affects the power, since the bus conflict will drain 
huge currents, and cause damage to the chip. To avoid bus contention during 
the scan testing phase, the enable to the tri-state buffer should be 
controllable, that is, by AND’ing it with the scan enable signal. In normal 
mode, the scan_en_n signal is de-asserted (logic 1), to allow the control 
to flow through, but in test mode, the drivers are disabled to avoid 
contention. The control inputs to these enables are assumed to be originated 
from the outputs ofFFs. This is shown in the following figure: 



controlinl 

Mutually exclusive inputs 

control_in2 

scanenn 

Figure 2-14. Tri-state and DFT 

Verilog sample code for these buffers is illustrated in the following: 
assign wirel = (control_inl & ~scan_en_n) ? 
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ini : 1 ' bz ; 

assign wirel = (control_in2 & ~scan_en_n) ? 
in2 : 1 ' bz ; 

2.12.3 Some Flip-Flops in my chip have their resets driven by other 
Flip-Flops within the chip. How will this affect the testability, 
and what’s the workaround? 

Normally, the asynchronous set or reset of a FF is controlled by the 
primary reset input pin. Sometimes it becomes inevitable to have the output 
of one FF to drive the asynchronous set/reset of another FF. In that case, 
during the scan testing, if the driving FF gets a pattern such that it resets the 
driven FF, it will destroy its data. To prevent this, the reset should be OR’ed 
with a test_mode test mode signal. The following figure illustrates this 
mechanism: 



FF1 FF2 




Figure 2-15. Reset and DFT 

The test_mode primary input is disabled (in this case 1’bO) during 
normal operations. However, during testing, the test_mode signal is 
asserted to l’bl, thus making the asynchronous reset deactivated. This will 
avoid corruption of FF2 output when a predictable pattern is being sent from 
FF1 toFF2. 

2.12.4 I have derived clocks in my chip. What are the testability 
implications, and what is the workaround for it? 

Derived clocks are generated by clock dividers through Flip-Flops or 
PLLs in a chip. Since these are derived from within the chip, there should be 
a control input from the primary pins, to avoid the Flip-Flops capturing data 
when they are not supposed to. 
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In this case, a multiplexor needs to be added in the clock path, with the 
control being the test_mode, and the inputs to the multiplexor being the 
regular clock and derived clock. That way, the final clock to the Flip-Flops 
is controllable between the derived clock in the normal mode and the regular 
clock in the test mode. Note that the test_mode signal doesn’t change 
dynamically, and, hence, it is okay to have a multiplexor in the clock path. 
Note, too, that anytime a switch in the clock is done, all the Flip-Flops need 
to be reset, to have a known starting value, and avoid spurious capture of 
data in their data lines. 

The following diagram illustrates the implementation: 



FF 



elk 



Clock divider 



H 



/ 





D Q 




) 



test mode 



Figure 2-16. Multiplexor in clock path using derived clocks 

2.12.5 My chip is power sensitive, and, hence, there are gated clocks 
in it. What are its testability implications and workaround? 

Gated clocks are inevitable in some designs to save power. Since the 
clock now passes through combinatorial logic, the gated clock is no longer 
controlled from a primary input, making it impossible to scan in the data. 

The workaround is to logically OR a test_enable pin to the enabling 
pin of the AND gate that gates the clock. Look into a FAQ 2.13.5 in this 
chapter for more details of implementing this workaround. 

2.12.6 What is the implication of a combinatorial feedback loops in 
design testability? 

The presence of feedback loops should be avoided at any stage of the 
design, by periodically checking for it, using the lint or synthesis tools. The 
presence of the feedback loop causes races and hazards in the design, and 
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leads to unpredictable logic behavior. Since the loops are delay-dependent, 
they cannot be tested with any ATPG algorithm. Hence, combinatorial loops 
should be avoided in the logic. 

2.12.7 How does the presence of latches affect the testability, and 
what’s the workaround? 

Since the enable to a latch isn’t the regular clock going to the rest of the 
Flip-Flops in the design, its output is not controllable directly from a primary 
input. In order to bring controllability to the latch, the enable to the latch 
needs to be OR’ed with a primary input pin like test_mode, as shown in 
the following figure. 



datain 

testmode 

enable 



Latch 




Figure 2-17. Latch with OR’ed test enable 

This way, the latch can be forced to become transparent when the test 
data needs to be forced into it. 

2.13 Power Reduction considerations 

Power reduction is a critical requirement in design of chips that are used 
in battery-operated devices. The more power a chip uses, the hotter it 
operates, slower it runs. The reliability of the chip decreases at higher 
temperatures. This section discusses how RTL can be used to influence the 
power dissipation within a chip, and what issues need to be considered when 
coding for the power saving. 

2.13.1 What are the various methods to contain power during RTL 
coding? 

Any switching activity in a CMOS circuit creates a momentary current 
flow from VDD to GND during logic transition, when both N and P type 
transistors are ON, and. hence, increases power consumption. 
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The most common storage element in the designs being the synchronous 
FF, its output can change whenever its data input toggles, and the clock 
triggers. Hence, if these two elements can be asserted in a controlled fashion, 
so that the data is presented to the D input of the FF only when required, and 
the clock is also triggered only when required, then it will reduce the 
switching activity, and, automatically the power. The following bullets 
summarize a few mechanisms to reduce the power consumption: 

• Reduce switching of the data input to the Flip-Flops. 

• Reduce the clock switching of the Flip-Flops. 

• Have area reduction techniques within the chip, since the number of 
gates/Flip-Flops that toggle can be reduced. 

The following FAQs discuss in depth how each of the above can be 
implemented in RTL. 

2.13.2 Illustrate how the switching of data input to the Flip-Flops 
helps in power reduction. 

In a circuit where the Flip-Flops need to be updated very rarely compared 
to the frequency of the clock, then it is appropriate to update the FF only at 
that time, and avoid the switching of its output all other times. This can be 
achieved through an enable FF, as shown in the following figure: 




If the control input comes from a state machine which can track exactly 
when this FF has to be enabled to capture the new input data, then the enable 
to the multiplexor can switch the multiplexor towards the input data. 
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Otherwise, it will be feeding the previous stable output from Q into the data 
input of the FF. 

An illustration of Verilog RTL that implements the enable FF is 
illustrated as follows: 

module enable_ff (elk, sel , reset_n, ini, outl); 
input reset_n, sel, elk, ini; 
output outl; 

reg outl; 

always @(posedge elk or negedge reset_n) begin 
if ( ! reset_n) 
outl <= 1'bO; 
else if (sel) 
outl <= ini; 
else 

outl <= outl; 

end 

endmodule 

The above style can be incorporated in the designs by following a coding 
convention for the flip-flops. But, this technique alone is not sufficient as a 
power reduction technique, as it has a drawback which is discussed in the 
next FAQ 2.13.3. 

2.13.3 What is the drawback of using the enable flip-flop to reduce 
the power consumption? 

Although the switching of the data is reduced using enable Flip-Flops, 
the clock input to the Flip-Flops is still running to a large number of other 
Flip-Flops. 

One side effect of the enable FF method is that it will introduce logic into 
the setup time of the D input, and possibly add to the delay, if the D input 
was the endpoint of a critical path. 

The other side effect is that the area increases if these Flip-Flops happen 
to be the storage elements of a large bank of registers. 
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2.13.4 Illustrate an example of clock gating to help in reduction of 
power. 

Clock gating is a common mechanism to save power. This technique 
reduces the switching activity of the output of the FF by: 

• eliminating the need for reloading the same value in the register 
during multiple clock cycle. 

• Reducing the clock network power dissipation. 

The most common method of clock gating is through the use of a latch 
and a gate. The following figure illustrates the implementation of this 
mechanism: 



control 




J gated 
clock 



output 

data 



Figure 2-19. Using latch for clock gating 

When the elk is in its low phase, the latch is enabled. The control input, 
which actually decides whether to gate the clock or not, is now propagated 
through the clock to its Q output. Flere, if the control input is high, the Q of 
the latch is high during the low phase, and remains so until the next low 
phase of the elk. This keeps the AND gate enabled. In the mean time, when 
the elk arrives, it gets propagated to the gated clock net. This happens 
cleanly, without any glitches, because the latch output is stable for sufficient 
time to meet the Flip-Flops setup requirements. When the control input goes 
low, it negates the AND gate and, hence, prevents the elk from being 
propagated to the gated clock net. This makes the gated clock net to be at 0 
without any switching activity. 

A simple Verilog code that illustrates the above logic is illustrated as 
follows. Note that the implementation of this strategy in large designs is best 
done through the synthesis tools without having to manually implement this 
strategy in the designs containing a large number of FFs. 
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module gated_ff (ini, cntrl_in, elk, reset_n, outl); 

input cntrl_in, ini, elk, reset_n; 
output outl; 

wire gated_clk; 
reg d_latch, outl; 

always 0 (cntrl_in, elk) begin 
if (elk) 

d_latch <= cntrl_in; 

end 

assign gated_clk = d_latch & elk; 

always 0 (posedge gated_clk or negedge reset_n) begin 
if ( ! reset_n) 
outl <= 1'bO; 
else 

outl <= ini; 

end 

endmodule 

The main reason for using a latch is to prevent the glitches on the 
gated_clk net since its changes happen during the low phase of the clock. 

Although the above illustration is shown for only one FF, the gated clock 
can actually be driven to all the remaining Flip-Flops in its clock domain. 
Also, the gating element has been an AND gate, depending upon the polarity 
of the enable to the latch and the low phase of the clock for a rising edge. 
This gate can change, depending upon any changes to these two polarities, 
that is, the logic level to enable the latch, and the edge of the clock, whether 
it is rising or falling. 

The logic shown within the dashed box will require being instantiated 
multiple times, depending upon how many branches the main clock tree has. 
Depending upon the buffering, clock skew, and loading, many such 
instances could be placed on each branch of the clock tree or at the root 
level. 
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2.13.5 What are the side effects of latched clock gating logic, and 
how is it fixed? 

Although the use of clock gating through latches is a good way to save 
power, it introduces the problem of testability, as illustrated. Due to the 
latch, the controllability of the gated clock signal is reduced, that is, the 
gated clock signal is now in the mercy of the control input only. During 
testability, if this signal is low, then it disables the propagation of the clock 
itself. 

To resolve both the above issues, additional logic needs to be added to 
enhance the testability. One way to increase the controllability of the gated 
clock is to introduce a control point in the input of the latch, so that the latch 
is “ON” during scan testing. This is illustrated in the following figure: 




output 

data 



Figure 2-20. Using latch for clock gating 

Based on the OR gating above, the scan enable signal will override the 
control input, such that the output of the latch enables the AND gate to 
propagate the elk net into the input of the Flip-Flops. 

A simple Verilog code that illustrates the above implementation is as 
follows. Note that synthesis tools can implement this logic illustration 
automatically, for all the FFs in a large design, rather than having to do 
manually. 

module gated_ff (ini, scan_en, elk, 

reset_n, cntrl_in, outl) ; 

input scan_en, ini, elk, reset_n, cntrl_in; 
output outl; 
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wire gated_clk, latch_en; 
reg d_latch, outl; 

assign latch_en = scan_en | elk; 

always @(cntrl_in, latch_en) begin 
if (latch_en) 

d_latch <= cntrl_in; 

end 



assign gated_clk = d_latc:h & elk; 

always @(posedge gated_clk or negedge reset_n) 
if ( ! reset_n) 
outl <= 1'bO; 
else 

outl <= ini; 



endmodule 



Sometimes there have been situations that the test tools used in the 
foundries don’t support the control before the latch and require it to be 
present after the latch. Since such a requirement comes from the foundry, the 
above circuit can be easily changed to position the OR gate after the latch, as 
shown below: 




output 

data 



Figure 2-21. Latch controllability after the output 

A simple Verilog code illustrating the above is as follows: 

module gated_f f_out_or (ini, scan en, elk, 

reset n, cntrl in, outl) ; 
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input scan_en, ini, elk, reset_n, cntrl_in; 
output outl; 

wire gated_clk, latch_en; 
reg d_latch, outl; 

// This is the latch 
always @(cntrl_in, elk) begin 
if (~clk) 

d_latch <= cntrl_in; 

end 

assign clk_gate = scan_en | d_latch; 
assign gated_clk = clk_gate & elk; 

// This is the gated ff 

always @(posedge gated_clk or negedge reset_n) 
if ( ! reset_n) 
outl <= 1'bO; 
else 

outl <= ini; 
endmodule 

2.13.6 What are a few other techniques of power saving that can be 
achieved during the RTL design stage? 

The following design considerations during RTL coding help in the 
reduction of power within the logic: 

• Run high frequency signals through as few intermediate logic levels as 
possible. This way, only those cells which need to be run at high 
frequency switch, and the rest of the logic can run at a relatively lower 
frequency. This would require multi clock design within the chip, 
preferably where the clocks are integral multiples of each other. The safe 
approach would be to route one master clock into the chip, and generate 
its sub clocks within the chip. 

• Only use as many Flip-Flops as required to store the data values, that is, 
if only 4 bits of a 32 bit register are going to be used, it is not required to 
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register the remaining 28 bits. Normally the additional unused FFs will 
be optimized away by the synthesis tools. 

• Gate the inputs using a select line. For example, the address lines from a 
CPU are continuously changing, and may not all the time refer to your 
device. In that case, it is better to gate the rest of the logic following the 
address lines with a signal like chip-select, which will hence reduce 
unnecessary switching activity. The chip select can be generated from 
one central address decoder. Although this decoder is switching all the 
time, it helps in the unnecessary switching in lots of other logic 
distributed elsewhere. 

• Choose Gray coding for state machines instead of binary encoding: 
Since only one bit changes at any Gray transition, the number of Flip- 
Flops switching, and the switching in the logic that it drives, is reduced. 
Note that this would potentially require more Flip-Flops than the binary 
encoded approach. Hence, for the most frequent transition arcs, use Gray 
coded transitions. Focus the gray coding efforts on common return to 
zero state transitions. 

• Choose a multiplexor instead of on chip tri-state buses: The biggest issue 
of on chip tri-state buses is the bus contention. Since there is a high 
possibility of one buffer beginning to drive the interconnect before the 
other has finished, there is a small window in which potentially opposite 
polarities are driven. This causes a transient short circuit on the internal 
bus. The choice of a multiplexor avoids the bus contention, but it could 
potentially add to the number of gates and logic path. Consider 
registering the inputs that come from these long paths. Tri-state buses 
also require internal pull up resistors and higher current signal drivers. 

2.13.7 What are a few system level techniques, apart from RTL, 
that can influence in the reduction of power for the chip? 

Having discussed a few techniques of saving power through RTL in the 

above FAQs, the following are a few system level variables that can 

influence power reduction: 

• Reducing operating voltage: Since power consumed is directly 

proportional to the square of the voltage, operating at a lower voltage is 
one way of saving power. Many of the semiconductor vendors have 
libraries that are designed specifically for low power. However, note that 
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there could be side effects in the static timing when the low power 
libraries are used. 

• Reducing operating frequency: Since the power consumed is directly 
proportional to the frequency, design technique of operating at a lower 
frequency, but increased bus widths to maintain the data rate 
performance requirements should be considered. For example, the rate 
of data transfer of a 32 bit bus at 100MHz is the same as a 64 bit bus at 
50MHz. Note that there will be additional design comers that get 
introduced as the widths increase, especially in non aligned byte transfer 
scenarios. 

• Running the I/O voltage different from the core voltage: In this 
technique, the I/O ring of cells are working at a different voltage from 
the rest of the core cells. This achieves interfacing to the signals external 
to the chip with a different voltage requirements than the core. It also 
isolates the core from output-transition noise. 

• Lower the capacitance of the routing network, especially for high 
frequency signals. 

2.13.8 What are a few power reduction techniques that can be 
achieved through static timing? 

Power reduction can be achieved in all stages of the chip process, that is, 
RTL techniques of gate clock, synthesis tool optimizing away unused logic, 
reducing capacitance of the routing network during backend, and also 
through good static timing. The following are a few considerations on how 
the power can be reduced through static timing: 

• Control clock skew between logic gate inputs. 

• Ensure that flip-flop inputs meet setup and hold time requirements, to 
avoid extended output settling transitions caused by metastability. 

2.13.9 What are a few power reduction techniques that can be 
implemented during the backend analysis? 

The following are a few parameters within the chip that can significantly 
influence the overall power consumption, which can be taken care of during 
the backend phase: 
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• Have shorter routes for power and timing sensitive logic: Since the 
capacitance of a routing net is a function of the length, width and 
impedance of the route, a long route typically has higher capacitance 
than the shorter alternative. Since dynamic power consumed is directly 
proportional to the capacitance, that is, P = CV 2 f, lower capacitance 
means lesser power. This would mean the logic blocks need to be closer 
to each other. 

• Reduce excessive loading: Heavily loaded nets cause higher capacitance 
and higher power consumption. 

2.13.10 What are a few power reduction techniques that can be 
implemented during board design? 

The following are a few techniques that can reduce power consumption 
at a board level 

• Reduce the chip interconnection dynamic power by limiting the 
number of I/O pins, the loading on each, pin and the average 
frequency at which each pin toggles. 

• Minimize the trace lengths between the chips output and other 
device inputs. 

SUMMARY 

This chapter discussed how the various Verilog constructs get inferred 
during synthesis, and the static timing implications. The chapter also 
discussed a few techniques on area reduction, and issues on testability and 
power. 




Chapter 3 

VERIFICATION 



INTRODUCTION 

The chapter aims to address the usage of Verilog constructs for 
Verification purposes. The topic of verification is vast, and this chapter 
touches four aspects of the verification phase. The chapter begins with a 
discussion on messaging, which is the mechanism of communicating from 
the test environment to the users. The chapter then discusses the importance 
of monitor in the environment, as a block that will check and report for 
errors that can be missed, if the checking were to be done by humans. The 
chapter then proceeds to discuss the Bus Functional Model’s (BFMs) 
purpose, and how it can be used for stimulus creation. The chapter concludes 
with the discussion on how the random stimulus generation can be achieved, 
using the Verilog constructs. 

3.1 Messaging 

Messaging is a mechanism to convey useful information in the form of 
messages during the simulation run. These messages can be used to convey a 
wide range of severity levels, from simple information to critical failure. In 
general, keeping the user informed of the proceedings during a session of 
simulation is a necessary requirement while developing the models. Note 
that these messages are implemented using non-synthesizable constructs. 
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3.1.1 What are a few considerations while implementing messaging 
in a model? 

Messaging, although it sounds simple, forms an important criteria during 
the design of a verification model. The following are a few considerations 
that need to be incorporated during its implementation within a model: 

• Identifying the level of a message in terms of criticality: Note 
that all messages are not equally important. Given this feature, 
all the messages to be conveyed need to be identified into 
different message levels. Some messages are only informative, 
and can fall into the INFO category of messages. Some other 
messages are critical, and fall into the FATAL category. Look at 
the next FAQ for more details on the levels. 

• Controllability of the message: All the messages need not be 
displayed to the users all the time. The user must have a 
mechanism to enable or disable the different levels of the 
messages. This will avoid the cluttering of the log files with too 
many informational messages, which might be trivial to the user, 
and end up missing something important. The controllability of 
the messages is typically achieved using a configuration 
command within the models. Note that this controllability is 
typically global, that is, the messages are either enabled or 
disabled across all the models, and it is rarely made selective. 

• Timestamp of occurrence: The timestamp of when the message 
occurs will be useful to begin debugging at that timestamp. 

• Message identification: All the messages should have some 
kind of ID or identification. This will be useful in two ways, that 
is, all messages can be documented in the user manual using this 
ID, and the unique ID of the message will be useful means of 
communicating across different members, rather than sending the 
full string of the message. Another requirement while framing 
the message ID is to prepend the acronym of the module from 
which this message is being generated. For example.: 

HBFM_PKTBGN: INFO: Beginning a new packet at 

t=105ns 



The above message indicates the following information: 
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That it is printed from the Host Bus Functional Model 
(HBFM). 

It is of the severity level (INFO). 

The unique ID is “PKTBGN”, informing the beginning 
of a new packet. 

The string is “Beginning a new packet”. 

It happens at timestamp 105ns 

The user can simply use the operating sysyetm grep command over the 
log file to identify INFO messages, or PKTBGN to see how many packets 
were initiated. 

• Language domain of the message: The domain at which this 
message is being constructed is also an important decision. Some 
messages are simple enough to be constructed within the HDL 
using the print/display function system call. For example,: 

$di splay ( "HBFM_PKTBGN : INFO: Beginning a new 

packet at t=%0d ns",$time); 

Sometimes these messages are implemented as a PLI call with 
the displaying function written in C language. In the following 
example, my_di splay is a PLI routine. 

$my_display (HBFM_PKTBGN, INFO, "Beginning a new 
packet at t=" , timestamp) ; 

Note above that all the different information of ID, level, string 
and timestamp are arguments to the user defined system call to 
display the messages. 

3.1.2 What are the different kinds of message severity levels? 

As discussed in the earlier FAQ 3.1.1, message levels are simply 
different hierarchies of message identification, based on criticality. These 
levels are chosen to avoid the user’s getting overwhelmed with too many 
messages that may not be of critical interest. These levels vary from user to 
user, in terms of what is defined as critical. 



Typically, the following are the different levels of messaging: 
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• INFO: The message associated in this level is simply an 
informational string. This level is typically used to convey the 
beginning or end of a sequence. 

• WARNING: This message level indicates something unusual 
happening. The level doesn’t stop the simulation session from 
proceeding ahead, but warns the user of an unexpected behavior. 

• ERROR: This message level indicates something is wrong. This 
message level is displayed before an erroneous scenario, and the 
simulation could be terminated after some time soon. 

• FATAL: This message level indicates a catastrophic situation, 
and warrants the termination of the simulation. This message 
level may not have the privilege of running the simulation ahead 
for a few more ticks. This message is typically displayed when 
there is a catastrophic functional scenario or a showstopper. 

As required for the project, more message levels can be defined to 
augment the above basic list. 

3.1.3 Illustrate an example of how message levels are implemented 
in a BFM. 

The following is an example of how the messages are implemented in the 
Verilog HDL. Although each message could simply be a $display system 
call, it is recommended to follow a more structured approach of using a 
Verilog task. The advantage of using task is that, since the messages will be 
in numerous places distributed in different models, it will be tedious to 
change in all these places if a modification in the format of the message is 
required. Changing it in one task would reflect globally in all places it is 
used.: 

module test_msg (); 

'define info 0 
'define warn 1 
'define error 2 
'define fatal 3 

task display_msg; 

input [(8*6) :0] mod_name; 

input [(8*8) : 0 ] msg_id; 

input [31:0] level; 

input [(8*50) : 0 ] stringvar; 



/ / max 7 chars 
/ / max 9 chars 
// any integer 
// max 50 chars 
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reg [7*8:0] level_string; // temp reg 

begin 

case (level) // converting integer into string 
'info: level_string = "INFO"; 

'warn: level_string = "WARNING"; 

'error: level_string = "ERROR"; 

'fatal: level_string = "FATAL"; 
endcase 

$display ( " %0s_%0s : \t%0s : %0s at t=%0d", mod_name, 

msg_id, level_string, stringvar, $time) ; 

// note the use of "\t" for tab 
end 

endtask // displayjmsg 
initial begin 

#5 display_msg ( "HBFM" , "PKTINFO", 'info, 

"Beginning packet"); 

#5 display_msg ( "UBFM" , "PKTWARN", 'warn, 

"Warning packet") ; 

#5 display_msg ( "ABFM" , "PKTERROR", 'error, 

"Erroneous packet"); 

#5 display_msg ( " IBFM" , "PKTFATAL", 'fatal, 

"Fatal packet"); 

end 

endmodule // module test_msg 

The output of the above will be as follows: 

HBFM PKTINFO: INFO: Beginning packet at t=5 

UBFM_PKTWARN: WARNING: Warning packet at t=10 
ABFM_PKTERROR: ERROR: Erroneous packet at t=1 5 
IBFM_PKTFATAL: FATAL: Fatal packet at t=20 

Since the above module test_msg is a generic one, it can be 
instantiated and used in several other modules as follows: 

module top_message_task (); 

// Instantiating the basic message module 
test_msg U_test_msg {) ; 
initial begin 
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// Using hierarchical call to the display message task 
#5 Utestmsg . display_msg ( "HBFM" , "PKTINFO", 

'info, "Beginning packet"); 
#5 U__test__msg . display_msg ( "UBFM" , "PKTWARN", 

'warn, "Warning packet"); 

#5 U_test_msg . display_msg ( "ABFM" , "PKTERROR" , 

'error, "Erroneous packet"); 
#5 U_test_msg . display_msg ( " IBFM" , " PKTFATAL " , 

'fatal, "Fatal packet"); 



end 

endmodule // top_message_task 

As evident in the above example, the string size is limited to the values in 
the reg definition. In the above case, a maximum of 6 characters can be 
specified for the mod_name variable. The size limitation can be overcome 
by two ways: 

• By parameterizing the size itself, rather than using fixed values. This 
gives users the flexibility to specify the sizes of the strings. 

• SystemVerilog has a string construct as a mechanism to specify 
variables. The string variables can be specified string values 
directly, and automatically size themselves to the size of the string, 
instead of the hardcoded byte mechanism; as specified above. The 
value can then be displayed using %s in the $display command. 

3.2 Behavioral Functional Models (BFMs) 

This section discusses Bus Functional Models (BFM). The BFMs contain 
the tasks/methods at the executor level, which perform the various 
operations like read, write, etc., at different abstraction levels. The BFMs 
can also have the facility for initialization, configuration, reporting, etc. 

3.2.1 What is a Bus Functional Model (BFM)? 

BFM is all of the following: 

• Models different levels of abstraction of the system to be modeled 
(example, a serial communication protocol). 

• At the lowest level of abstraction (example, at the symbol-level 
or at the serial bit-stream abstraction), it is clock-cycle accurate. 
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• At higher levels of abstraction, it need not be necessarily clock- 
cycle accurate. 

• It provides visibility into its communications processes at each 
level of abstraction. 

• It provides visibility into all of its “tunable” parameters. 

• It can be commanded to perform specified sequences of 
communications at any level. 

• It can be commanded to modify any of its “tunable” parameters. 

• It can be used to build up higher levels of software (example, 
device drivers). 

In a typical verification environment, the BFM plays a role of helping 
produce the stimulus to the DUT, as illustrated in the following figure. This 
figure is a simplified version of a BFM implementation, and doesn’t 
illustrate the details of execution flow. At the user level, the constraints and 
the commands can either be parsed through a parser or can be through 
include. 
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Figure 3-1. A BFM in a typical verification environment 



3.2.2 What are a few considerations that go into designing a BFM? 

A Bus Functional Model is a key module within the verification system. 
Architecting the BFM for the testbench scenarios where it will be used is an 
important planning task. 

The following are a few considerations during the designing of a BFM 
that could be classified as “must-do” requirements. 
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1. If the protocol or the functionality for which the BFM is being 
designed can be categorised into hierarchies, it is highly 
recommended to have implementation that can map to this hierarchy. 
The Universal Serial Bus (USB) protocol is a classic example of such 
a requirement. Since the protocol specifies functionality that can be 
categorised into bit level, packet level, transaction level, and transfer 
level protocols, it will be appropriate to map the tasks within the 
BFM commands to map this level of hierarchy. This would not only 
avoid repetition of the functionality of the various layers, but it 
would also make it highly reusable. 

2. Depending upon the layer of the command, that is, whether it is a 
transaction or transfer level command, there must be provision for 
built-in self-check capability. The users instantiating the BFM 
shouldn’t need to build the self-check from the outside. Any 
violations detected during this self-check must be clearly notified to 
the users. For example, the read_transfer commands should 
have the automatic data check provision within it. 

read_transf er (argl ; arg2 , ... , expected_data) ; 

In the above example, the expected_data is compared against the 
data obtained during the read transfer. 

3. There must be provision to specify all the key variables within the 
BFM either as a "include file or as user defined configure 
Verilog task. These variables decide what features within the BFM 
need to be implemented, and what should be the values that override 
the defaults, etc. For example, 

configure (burst_support , "true) ; 

4. All the key configurable variables must have defaults, so that even if 
the user doesn’t want to change any of them, the BFM should 
gracefully work in these default conditions. These variables can be 
initialized using the initial construct, or, in SystemVerilog, the 
initialization can be done in-line, along with the declaration of the 
variable. For example, 

integer i = 5; 
reg [3:0] count = 4; 
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Note that in Verilog-2001, in-line initialisation will cause a 
simulation event at time 0. An enhancement in SystemVerilog is that 
the in-line variable initialization doesn’t cause a simulation event, 
and, hence, provides a deterministic way of initialization of these in- 
line variables. 

5. Each of the BFM task call should involve a status output. This will 
be useful for implementing an optional flow control mechanism that 
can alter the flow of testing. The pseudo-code for such is as follows: 

write_transf er (argl , arg2 , ..., status) ; 

// altering the flow control here 
if (status == 'failed) 

<prepar e_f ai lur e_recovery_sequence > 
else 

<continue_with_rest_of_sequence> 

6. There must exist provision for the user to specify filtering of the 
messages from the BFM using message levels. This can be in the 
form of commands that configure the severity of the messaging 
levels. For example,: 

configure (message_level , 'error) ; 

This will display only those messages that are more severe than 
ERROR’S to be displayed. This will help to avoid too much clutter in 
the messages. 

7. If the functionality within the BFM task requires waiting on an event, 
there must be a timeout mechanism, in case an event or signal 
assertion from the DUT doesn’t arrive. The non-occurrence of the 
event should be notified as either a WARNING or ERROR to the 
higher-level modules and shouldn’t end up in causing a hang in 
simulation. A snippet of code for the timeout detection is as follows: 

if (start_of_event) begin 
while ( -timeout) begin 
@ (posedge elk); 
if (event_detected) begin 

$display ( "Event detected at t=%0d", $time) ; 
break; // SystemVerilog feature 
end 
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timeout = timeout - 1; 
end 

if (~event_detect:ed) 

$display ( "Timeout for start_of_event reached 
at t=%0d" , $time) ; 
end 

8. All the inputs to the tasks/commands within the BFM should be 
checked for its legal ranges. If the values are out of range, the users 
should be notified of the input illegal value and the legal range. The 
execution of the task would normally ERROR out with the message. 
The illegal inputs shouldn’t cause a chain of functionally incorrect 
sequence of events. For example, in the following task, each input 
value is verified against its minimum and maximum values. It is 
recommended to have parameters for the extremes of the ranges, 
since it can be modified by the user. 

parameter min_value = 0; 
parameter max_value = 10; 

task modify_value ; 
input [3:0] ini ; 
output int_val ; 
reg int_val ; 
begin 

if (ini >= min_value && ini <= max_value) begin 
int_val = ( | (ini) ) ; 

//Do the rest of the operation in a task 
end else begin 

$ displ ay ( "argument ini of %0d in task 

modify_value is out of range between %0d and %0d, t 
= %0d",inl, min_value, max_value, $time) ; 
end 
end 

endtask 

Note that, in the above example, the checks for only the ini input 
have been illustrated. The same can be extended for the rest of the 
inputs in this task. 

9. There must be provision for the user to provide some kind of ID to 
the BFM, so that, in a multiple instance scenario, the source of the 
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messages can be uniquely identified. An example of how the BFM 
ID can be used has been illustrated in the earlier examples of 
messaging. 

10. The values of all the final assignments to the key variables should be 
displayed to the user, as an acknowledgement of the user’s 
constraints being accepted. This serves two purposes: 

a. It provides information on what these values were before the 
simulation run, and 

b. confirms the assignments of the constraints. 

3.2.3 What is a typical flow in designing a BFM? 

The architecting and designing of a BFM for a verification system is an 
important task, to be planned carefully. The requirements that specify the 
design of the BFMs could be quite unique to each system. The following are 
a few steps that will be useful during the design of the BFMs and meet these 
diverse requirements of the BFMs for the various verification systems. 

At a core level of the BFM, that is, where the functionality of the BFM is 
being defined, the following are the recommended steps: 

1. Specify the abstraction level at which the BFM is planned to be 
used. At the interface level, it needs to be decided whether the BFM 
has to be clock cycle accurate in handshaking to the DUT. 

2. Specify the user level configuration parameters that influence the 
operation of the BFM. Typically, the static parameters, that is, the 
ones that don’t change throughout the simulation, are defined by the 
user in a fde. The parameters that influence the simulation on an 
ongoing basis are typically defined through the configure 
commands. 

3. Specify the hierarchy of the commands when they are functionally 
dependent on a hierarchical fashion. The content and mechanism to 
pass information back and forth between these hierarchies need to be 
defined. This way, the division of functionality within these 
hierarchies also gets defined. 
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4. Specify the commands/calls that the user will be able to call within 
this BFM. The task names, inputs/outputs/inouts for these tasks need 
to be defined. 

5. Specify the details of the messages that the BFMs convey. The 
details of the message string, debug levels, severity level, and token 
of the string need to be planned. This needs to be done with the tasks 
and methods that are planned to be within the BFM. 

6. Specify the interface ports that these BFMs use to interact with the 
DUT and the timing relationships that need to be maintained for the 
ports with regard to the tasks/methods. 

7. Develop the code within these tasks. This forms the “meat” of the 
BFMs. It will be useful to indicate the timestamp of the entry for 
these tasks. Try confining this content to the functionality only, and 
defer any timing checks or requirements outside this core as a 
wrapper. 

The above constitute the most basic steps. The users may need to 
customize the above based on the requirements in the project. At the usage 
level, the following steps need to be planned and implemented: 

1. Determine whether these tasks will be interpreted from a file 
through a parsing mechanism, or an 'include mechanism, or an 
automatic stimulus generation mechanism. Accordingly, this could 
require the development of a file parser. The parsing mechanism 
helps in identifying the user defined typos/errors upfront, before it is 
detected too late in the simulation runtime. But, this should not be 
the only way these task calls should be accessed. The provision to 
access these commands in a hierarchical fashion should be available 
for either directed testing, or random simulation, or reactive test 
generation sequence, too. In the automatic stimulus generation 
scheme (for example, in the random testing or reactive stimulus 
generation) the next command is not known a priori until the 
response for the current command is obtained on the fly. 

2. All the configurable parameters need to have defaults defined within 
the BFM. If the user doesn’t specify explicitly any parameter, then 
these defaults will apply. In a typical implementation, these defaults 
can be specified using the parameter construct with an initial value 
assignment. 
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3. Before the simulation proceeds, the values of key parameters that 
influence the simulation need to be displayed. This will help the user 
to ensure that the simulation is on the right track. Any incompatible 
combination of parameter values should immediately be flagged and 
simulation terminated, according to its severity. 

4. The user must have a provision whether to log the BFM messages 
into a file or not. In large regressions, this will be useful for not only 
the disk space, but also the runtime of the regression. 

5. To avoid user errors, all the parameters and the input values must 
check for valid ranges. Invalid ranges must be flagged as an error 
and the simulation must be terminated immediately. 

3.2.4 How can BFMs be used to inject intentional errors in the 
stimulus? 

From a user-level abstraction, this can be done via appropriate user level 
commands that the BFM provides. Internally, the BFM should be 
constructed such that it is capable of injecting errors into selected portions of 
its stimulus generation logic, when commanded to do so. Usually, internal 
to the BFM, error injection is handled as separate layers, one for each 
abstraction level, each working “below” the layer that is capable of 
conducting regular operations. This error injection layer will be activated 
upon setting of the appropriate internal flags (which are, in turn, set by 
appropriate user commands). Once activated, the error injection layer will 
intercept all stimulus produced by the layer above it, inject the appropriate 
errors (as commanded by the user), and pass the stimulus to the processes of 
the next abstraction level. 

With this approach, intentional errors can be injected into any level of 
abstraction of the system being modelled. This is particularly powerful 
when used in conjunction with constrained-random stimulus generation 
capabilities. 

One of the key requirements during the intentional error injection is to 
keep the user and the test environment, informed of the intentional error and 
the expected response. Otherwise, it will create false alarms during the 
simulation run or debug phase. 
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3.3 Bus Monitors 

A bus monitor, as its name suggests, is a module that monitors or tracks 
the transactions going through the bus. The code for a monitor is a substitute 
for human monitoring, which is typically prone to errors. For example, in the 
following figure, the bus monitor is placed between the host bus and the 
DUT. 
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Figure 3-2. Bus monitor for a host bus 

As seen from the above figure, the bus monitor typically has all the bus 
signals as inputs only. The outputs it typically drives are non-bus related, 
that is, sidebands for informing critical status as hardware signals. The bus 
monitor is behavioural code that is instantiated wherever the bus is being 
connected. 



3.3.1 What are the main responsibilities of a bus monitor? 

The main responsibilities of a bus monitor are as illustrated below: 

• Protocol Checking: This is by far the most important responsibility 
in the bus monitor. Since the bus monitor tracks all the activities 
happening on the bus, it must monitor the bus to track: 
o violations of the bus protocol 
o any X or Z values in the signals to be monitored 
o design latencies between necessary signals 
o monitor timing violations of setup/hold, etc. Note that this is 
not strictly a protocol check. 
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Anytime any of the above checks fails, the appropriate message 
should be displayed to the user, and any associated sideband output 
signal should be asserted. Typically, these checks are implemented, 
based on the rules governing the protocol compliance. It is a good 
practice to manually comb through the specification of the host bus, 
and extend these rules for additional checks that will benefit the 
verification activity. 

• Transaction logging: Since the bus monitor is already keeping track 
of the bus activity, it will be helpful to the user to log the bus 
activity as useful information. The logging must consist of the 
following: 

o Message with the timestamp of critical activity, like the 
beginning of the transaction, address, details of the 
transaction type, data written/read, and the transaction 
termination type. These messages could be either verbose, 
or simply indicate the logic levels of the address, data, and 
control signals. If verbose, it is a good idea to have controls 
on the message levels, based on configurable parameters, 
o Facility to log the activity into a file or stdout or both, 
o Since the messages within the monitor are unique to its 
instance, in a multiple instance scenario, a provision should 
exist to specify a unique identifier for each instantiation. For 
the user to know from which monitor a particular message 
came, it would be useful to prepend the message with the 
instantiation ID of the monitor. The monitor can be defined 
such that the instantiation ID is a parameter. The user can 
then override this parameter for each instance, with unique 
values. 

• Sideband signals: Typically, it is considered that a bus monitor 
consists mainly of inputs. While this is true for the primary bus 
signals that the monitor is connected to, it is sometimes useful to 
convey to the testbench on some important information in the form 
of hardware sideband signals. This will help in immediate actions 
being taken at the testbench. The presence of sideband signals is an 
optional requirement, and specific to the needs of the monitor. One 
typical sideband signal is the status bus, which has encoded in it the 
various responses that logic in the testbench can infer, and possibly 
terminate the simulation. If there isn’t provision for sideband signals 
for status outputs, then there must be a provision within the monitor 
to provide tracking information to a higher-level module. 




130 



Verification 



instantiating the monitor. This could be as simple as a reg variable 
of status, that can be accessed using the hierarchical access, instead 
of being a port. This flag will be useful to terminate the simulation. 

3.3.2 Illustrate with an example, the design of a bus monitor. 

The monitor needs to be carefully designed considering, the functionality 
and the ease of use in a testbench environment. The following example 
illustrates how a bus monitor is being designed for an example application 
interface, although the underlying principle can be extended for more 
complex host buses. The interface and timing diagram for the bus monitor 
are as follows: 




Figure 3-3. Interfacing an application bus monitor 

The example application interface shown above, contains numerous types 
of transactions indicated by the type [2:0] bus. For the purpose of 
illustration, only a burst write transaction portion is implemented in the 
monitor. This application interface is a hypothetical interface, and does not 
represent any proprietary interface. 

The timing diagram of a burst transaction is shown in the following 
figure. The protocol for the write burst is as follows: 

• The transaction begins with the sampling of start_xfr signal 
with respect to elk, and must be asserted for only one clock. 
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• If rnw=0, it is a write transaction, else if rnw=l, it is a read. The 
rnw must remain in its state throughout the burst, and is allowed to 
change only during the next start_xfr. 

• The type of transaction is indicated by the type [2:0] output bus, 
and remains valid throughout the burst. 

• The datao output bus is allowed to change only when dack is 
sampled. The data must otherwise remain constant. 

• The dack is a data acknowledge signal from the application, and 
can insert wait states when not ready to accept the data. 
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Figure 3-4. Timing diagram of a burst write 



The requirements in designing an application bus monitor for the above 
portion of the protocol are as follows. More specific requirements can be 
added, as per the functionality. 

1. Identify the checks for the protocol: 

There are numerous combinations of scenarios that can occur in this 
simple protocol that need to be monitored. However, for this 
illustration, the following three simple rules will be checked in this 
monitor: 

o During start_xf r sampling, all control signals other than 
datao must not be X or Z. If any of them is X or Z, it is an 
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error. If any bit of datao is X or Z. there must be a 
warning. 

o Between start and end of a burst, the type and mw signals 
should never change. Otherwise, it is an error and terminates 
the simulation. 

o The addr range must be within the min and max limits 
configured by the user. Otherwise, it is an error and 
terminates the simulation. 

2. Identify the messages, message levels, display format/template, and 
the message IDs. These should correspond to the list of checks 
identified to be monitored. The following are the message levels, 
messages, and the IDs to be implemented for this monitor: 



Table 3-12. Messages, 


evels, ID’s for the bus monitor example 


ID 


Level 


Message 


INVADDR 


ERROR 


addr found with x/z value during 
sampling of start xfr 


INSTABLE 

CONTROL 


ERROR 


<signal> found changing value between 
the start and end of a burst. [<signal> = 
type, mwl 






Datao bus is found having x/z during 
dack smpling 




ERROR 


The addr lines found to have values out of 
range specified by the user during 
start xfr sampling 



3. Identify the user configurable parameters for the monitor. 

This particular monitor has three user configurable parameters, that 

is, 

o addr_min : This parameter defines the lower valid value of 
the addr bus. 

o addr_max : This parameter defines the upper valid value of 
the addr bus. 

4. Identify the need and role of sideband signal outputs 

This monitor has a single output sideband signal status [3:0], 
which contains information useful for the top level testbench to 
know at any given time as to what the status of the transaction is. 
The following are few encoding sequences in the status bus: 

4’b0000 : OKAY. No problems 
4’bl000 : ERROR. Terminate simulation 
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All other values can be used for conveying any other information to 
the testbench. 

With the above description as a specification to the bus monitor, the 
following is an example code of implementation. 

// User configurable parameters 

'define BEAT 4 4 
'define addr_min 8'haO 
'define addr_max 8'hfO 

module app_mon { elk, reset_n, addr, type, datao, 
app_resp, start_xfr, rnw, dack, 
status) ; 

parameter inst_num = 0; 

input elk; 
input reset_n; 
input [7:0] addr; 
input [2:0] type; 
input [1:0] app_resp; 
input [7:0] datao; 
input start_xfr; 
input rnw; 
input dack; 
output [3:0] status; 

reg xn_in_prog, reset_det; 

wire end_of_xfr, graceful_termination, 

except ion_terminat ion; 

reg [2:0] count ; 

reg saved_rnw; 

reg [2:0] saved_type; 

reg term_sim; 

// defines for the message severity levels 
'define info 0 
'define warn 1 
'define error 2 
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'define fatal 3 
task display_msg; 

input [(8*7) -1:0] mod_natne; // max 8 chars 

input [(8*20) -1:0] msg_id; // max 20 chars 

input [31:0] level; // any integer 

input [(8*100) : 0 ] stringvar; // max 100 chars 

reg [7*8:0] level_string; // temp reg 

begin 

case (level) 

'info: level_string == "INFO"; 

'warn: level_string == "WARNING"; 

'error: level_string = "ERROR"; 

'fatal: level_string = "FATAL"; 
endcase 

$display ( " %0s_%0d_%0s : %0s: %0s at t=%0d\n" , mod_name , 

inst_num, msg_id, level_st;ring, stringvar, $time) ; 
end 

endtask // display_msg 

always @(posedge elk or negedge reset_n) 
begin 

if (~reset_n) begin 
xn_in_prog <= 0; 
end else begin 
if (end_of_xfr) 
xn_in_prog <= 0; 
else if (start_xfr) 
xn_in_prog <= 1; 

end 

end 

initial begin 
reset_det = 0; 
term_sim = 0; 
end 

// This process infers what the transaction type is 

always @(posedge elk or negedge reset_n) 
begin 

if (~reset_n) begin 
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count <= 0; 
end else begin 

if (start_xfr) begin 
if (dack) begin 
count <= 2; 
end else begin 
case (type) 

'BEAT 4 : count <= 3; 
default : count <= 0; 
endcase 
end 

if ( (addr < 'addr_min) | | (addr > 'addr_max) ) 
begin 

display_msg ( "APP_MON" , "ADDR_OUT_OF_RANGE" , 
'error, "Address lines addr is out of the specified 
range addrmin and addr_max"); 
term_sim <= 1; 
end 

end else if (dack) 
count <= count - 1; 

if (term__sim == l'bl) // term_sim is a pulse 
term_sim <= 0; 
end // if 
end // always 

assign gracef ul_termination = (count = = 0) & 

xn_in_prog & dack; 
assign exception_termination = (app_resp != 0) & 

xn_in_prog & dack; 

/ / Users can add more status info 
assign status = { term_sim, 3 ' bO } ; 

assign end_of_xfr = (gracef ul_termination | 

except ion_terminat ion) ; 



always @(posedge elk) 
begin 

if (~reset_n) begin 

display_msg ( "APP_MON" , "RESET_DET" , 'info, 
"Synchronous reset seen asserted"); 
reset det <= 1; 
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end else if (reset_n & reset_det) begin 
reset_det <= 0; 

display _msg ( "APP_MON" , "RESET_DET" , 'info, 
"Synchronous reset seen deasserted" ) ; 
end 

// Checks for x's on critical signals during 
// transaction in progress 

if (xn_in_prog) begin 

if ( A addr === l'bx) 

display_msg ( " APP_MON" , "INV_ADDR", 'error, "addr 
signal is x" ) ; 

if ( A datao === l’bx) 

display_msg ( "APP_M0N" , "INV_DATA", 'warn, "data 
signal is x"); 

if ( A type === l'bx) 

display_msg ( "APP_MON" , " INV_CNTRL" , 'error, "type 

signal is x") ; 

if (rnw === l’bx) 

display_msg ( "APP_MON" , "INV_CNTRL" , 'error, "rnw 
signal is x" ) ; 

// There is no easy way to check for a bit in a vector 
// being set to Z . So we will assume that there is a 
// function called "check_for_z (input_vector) " , that 
// check each bit of the input_vector and return a 1 if 
// any bit is set to Z. If no bit of the input_vector is 
// set to Z, then the function check_for_z will return 
// a 0. 

if (check_for_z (addr) ) 

displayjnsg ( "APP_MON" , "INV_ADDR", 'error, "addr 
signal is z " ) ; 

if (check_for_z (datao) ) 

display__msg ( "APP_M0N" , "INV_DATA", 'warn, "data 
signal is z"); 

if (check_for_z (type) ) 

display_msg ( "APP_M0N" , " INV_CNTRL" , 'error, "type 
signal is z"); 

if ( rnw === 1 1 bz ) 
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display_msg ( " APP_MON" , "INV_CNTRL", "error, "rnw 
signal is z"); 

end 

if (start_xfr) begin 

display_msg ( "APP_MON" , "XN_START" , "info, "New 
transfer started"); 

saved_type <= type; 
saved_rnw <= rnw; 
end 

if ( (xn_in_prog | start_xfr) & ~dack) begin 

display_msg ("APP_MON" , "DATA_WAIT" , 'info, "Data 
wait state"); 

end else if ( (xn_in_prog | start_xfr) & dack) begin 
if (rnw) 

display_msg ( "APP_MON" , "DATA_XFR" , 'info, "Read 
Data transfer"); 
else 

display_msg ( "APP_MON" , "DATA_XFR" , 'info, "Write 
Data transfer"); 
end 

// Checks for INSTABLE_CONTROL ID 
if (xn_in_prog & dack) begin 
if (saved_type != type) begin 

display_msg ( "APP_MON" , "INSTABLE_CONTROL" , 'error, 
"type signal changed before end of transfer"); 
term_sim <= 1; 
end 

if (saved_rnw != rnw) begin 

display_msg ( "APP_MON" , " INSTABLE_CONTROL" , 'error, 

"rnw signal changed before end of transfer"); 
term_sim <= 1; 
end 
end 

if (end_of_xfr) begin 

display_msg ( "APP_MON" , " XN_END " , 'info, "Transaction 

ends " ) ; 
end 
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end 

endmodule / / app_mon 

A few salient points regarding the above monitor implementation are: 

• The above monitor example can be enhanced by adding more checks, 
based on the specifications of the tests that the monitor is supposed to 
check 

• The example hasn’t implemented the debug severity levels for 
messages, that is, messages from what level need to be displayed to 
the users. The users are encourage to add severity levels to messages. 

• The above simple illustration may need to be implemented in 
multiple files/tasks, based on how many checks need to be 
implemented and the list of features to be monitored. 

3.3.3 What other considerations go into designing a Monitor? 

While designing a bus monitor, the following are a few useful features 
that need to be considered during specification and implementation: 

• In a multiple instance scenario, there must be provision to indicate 
each monitor instance’s unique ID, so that the messages displayed 
can be unique, in order to know which instance displayed the 
message. 

• There must be provision within the monitor to specify the message 
debug level that filters the messages accordingly. 

• A provision must exist to specify all the variables in the monitor, 
either in the form of a "include, or as user definable 
configure commands. 

3.4 Random stimulus generation 

Random stimulus generation or random simulation uses stimulus 
sequences to the DUT that are random in nature. Note that it is the sequence 
that is random. The stimulus for the random simulation is usually machine 
generated, unlike the directed tests, which are generated by the verification 
team members. It is important to run the random simulation for the 
following obvious reasons: 
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• Since humans specify the directed test plans, it would be too idealistic to 
assume a perfectly complete test plan, foreseeing all the possible 
scenarios that the DUT would be subjected to in real life. 

• The stimulus sequence generation by humans is time consuming, 
compared to the machine generated sequence. 

• The randomness is used to depict real life scenarios, which doesn’t 
necessarily follow the sequence tested mostly in the directed tests. 

• Since the random testing covers a wider span of test scenarios, including 
the hard to reach corners (, that is, beyond what is expected from 
humans), the coverage of functional testing is much larger than the 
directed sequence. 

Although setting up a random based simulation environment up front will 
take longer time for the first successful transaction/test to be up and running, 
the returns of planning this environment is still worth considering in the long 
run. Deferring the random testing for the end of directed testing could bring 
up surprises that would detect major architectural changes required late in 
the design cycle. 

3.4.1 Explain with an example, how do I generate random 
numbers in Verilog? 

Verilog has an useful system function, namely $random, which basically 
returns a 32 bit random number every time it is called. The range can also be 
constrained, by using the modulus operator (%). An example demonstrating 
this is as follows: 

module rand_seq (); 

integer rand, i, constr_rand, constr_range ; 
initial begin 

for ( i = 0 ; i<5 ; i = i+l) begin 
rand = $random; 

$display ("rand %0d = %0d",i, rand); 
end 
end 

initial begin 

constr_range = 10; 

$di splay ( "\nconstr_range = %0d\n" , constr_range) ; 
for ( i = 0 ; i<5 ; i = i + l) begin 
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constr_rand = $ random % constr_range ; 

$display ( " constr_rand %0d signed = %0d, \t unsigned 
= %0d",i, constr_rand, $unsigned (constr_rand) ) ; 

end 

end 



endmodule // rand_seq 

The output of the above simulation run provides output as follows. Note 
that the system function call $unsigned was used to get the 32-bit formatted 
unsigned equivalents of the generated random numbers. 



rand 0 = 303379748 
rand 1 = -1064739199 
rand 2 = -2071669239 
rand 3 = -1309649309 
rand 4= 112818957 

constr_range = 1 0 

constr_rand 0 signed = 7, 
constr_rand 1 signed = -1 , 
constr_rand 2 signed = -4, 
constr_rand 3 signed = 1 , 
constr_rand 4 signed = 9, 



unsigned = 7 
unsigned = 4294967295 
unsigned = 4294967292 
unsigned = 1 
unsigned = 9 



Note that since the unsigned version was used, the numbers look quite 
large. For shorter ranges, the specific bit ranges of [3:0] could be used to get 
numbers within desired ranges. 

SystemVerilog has the construct rand to declare random variables, and a 
randomize command which returns a random number of the variable. 
Another construct, named constraint can be used to specify the constraints 
of the returning value. 

3.4.2 Explain with an example, how do I generate random 
stimulus? 



The $random system call, as illustrated in the previous question can be 
used for generating a random stimulus sequence. While it is quite 
straightforward to get a random number, mapping this into a functional 
stimulus is the tricky part. A mechanism to typecast the random number 
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obtained into a meaningful stimulus is required. The following example will 
illustrate how the random stimulus sequence could be generated by using the 
tasks/methods of a Bus Functional Model (BFM). The background of the 
BFM is as follows: 

• The BFM has three tasks, for write, read and 
read_modif y_write commands. 

• Each command in the BFM has arguments of addr, data, and 
byte_enables, which specify the address to be read/written, 
the write/expected data, and the byte enables for the field. For 
simplicity, the address and data is assumed to be 32 bits, and the 
byte_enable to be 4 bits. 

The random stimulus generation is required to do the following: 

• The maximum number of transactions generated out of this 
random simulation must be user specifiable. 

• There should be a random number of wait states (between 0 and 
5) between the transactions. 

module rand_stim (); 

parameter num_trans = 10; // how many transactions 

// These define the mapping/type-casting of 
// random integer to the task call 
'define read 1 
'define rmw 2 
'define write 3 

integer i; 

reg [31:0] rand_addr, rand_data; 
reg [3:0] rand_be ; 

reg [1:0] trans_type; 

reg [2:0] rand_wait; 

task write; 

input [31:0] rand_addr; 
input [31:0] rand_data; 
input [3:0] rand__be; 
begin 
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$display ( "Executing write transaction: addr = %0h, 

data = %0h, be = %0h at t = %0d" , rand_addr , rand_data, 

rand_be, $time) ; 

// Add the rest of the code for the write protocol 
end 

endtask // write 
task read; 

input [31:0] rand_addr ; 
input [31:0] rand_data; 
input [3:0] rand_be; 
begin 

$display ( "Executing read transaction: addr = %0h, data 
= %0h, be = %0h at t=%0d", rand_addr, rand_data, 

rand_be, $time) ; 

// Add the rest of the code for the read protocol 
end 

endtask // read 
task rmw; 

input [31:0] rand_addr; 
input [31:0] rand_data; 
input [3:0] rand_be; 
begin 

$display ( "Executing rmw transaction: addr = %0h, data 
= %0h, be = %0h at t=%0d" , rand_addr, rand_data, 

rand_be, $time) ; 

// Add the rest of the code for the rmw protocol 
end 

endtask // rmw 
task idle; 

input [2:0] num_wait; 
begin 

$display ( "Executing idle for %0d clocks" , num_wait) ; 

// Add the for loop with statement similar to 

// @(posedge elk) here 

end 

endtask // idle 
initial begin 

for ( i = 0 ; i<num trans; i=i + l) begin 
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/ / loops for user 
// specified transactions 
rand_addr = $ random; 
rand_data = $random; 
rand_be = $ random; 
rand_wait = $ random; 
trans_type = $ random; 

// $display ("trans_num = %0d, trans_type = %0d, 

// rand_addr = %0h,\t rand_data = %0d, 

// rand_be = %Oh",i, trans_type, rand_addr, 

// rand_data, rand_be) ; 

case (trans_type) 

'write : write (rand_addr, rand_data, rand_be) ; 
'read : read (rand_addr, rand_data, rand_be) ; 

'rmw : rmw (rand_addr, rand_data, rand__be) ; 

// Extend for more task calls here 
// if none above, read 

default : read (rand_addr, rand_data, rand_be) ; 

endcase // trans_type 

if (rand_wait > 5) 
idle (0) ; 
else 

idle (rand_wait) ; 

// add code to terminate the loop through a break or 
// disable command if simulation needs to be terminated 
#5; // to move time 

end // for 
end // initial 

endmodule // rand_stim 

The following is the output of the above code: 

Executing read transaction: addr = 12153524, data = c0895e81, be = 9 t=0 
Executing idle for 3 clocks 

Executing read transaction: addr = 46df998d, data = b2c28465, be = 2 1=5 
Executing idle for 1 clocks 

Executing read transaction: addr = 3b23f1 76, data = 1e8dcd3d, be = d t=10 
Executing idle for 4 clocks 

Executing write transaction: addr = e33724c6, data = e2f784c5, be = a t=15 
Executing idle for 5 clocks 

Executing read transaction: addr = 8932d612, data = 47ecdb8f, be = 2 t=20 
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Executing idle for 0 clocks 

Executing read transaction: addr = e2ca4ec5, data = 2e58495c, be = d t=25 
Executing idle for 5 clocks 

Executing rmw transaction: addr = b1ef6263, data = 573870a, be = 0 1=30 
Executing idle for 0 clocks 

Executing write transaction: addr = cecccc9d, data = cb203e96, be = 3 1=35 
Executing idle for 5 clocks 

Executing read transaction: addr = 359fdd6b, data = eaa62ad5, be = 2 t=40 
Executing idle for 0 clocks 

Executing read transaction: addr = e7c572cf, data = 11844923, be = a t=45 
Executing idle for 2 clocks 

The following are a few salient points of the above code: 

• The number of transactions for execution can be changed during 
instantiation of the rand_stim module by overriding the default 
parameter num_trans value of 10. 

• In the above, the sequence generator and the tasks of the BFM are 
illustrated at the same level. The random sequence generator could 
be a module that infers the tasks/methods of the BFM from a 
different level of hierarchy too, and hence, facilitating the functional 
partitioning better. 

• There were no other statements than $display within the tasks, since 
it was purely for illustration. These tasks in real life would have the 
necessary statements within the write, read, and rmw tasks. 
There would also be a signal like the clock, which will move the 
simulation time as per the protocol. In the above code, #5 was used 
as a constant time between transactions. 

• The tasks should preferably be automatic , which enables re-entrance 
usage of the tasks by multiple sources concurrently. Also, avoid use 
of any global variables that could be modified by these tasks. 

• Since each task is blocking in nature, the next task would begin only 
after the completion of the previous one. 

• Typically, before the terminal value of num_trans transactions is 
reached, there could be a need to terminate the simulation or exit the 
for loop. This can be achieved by the use of either disable statement 
or the break statement (available in SystemVerilog). 

3.4.3 How do I generate constrained random stimulus using 
Verilog? 

In the example in the previous FAQ, the code is sufficient enough to 
produce the address for the full 32 bit range. Sometimes, it is useful to 
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constrain the stimulus of the random address range for the testing purposes. 
For example, the address range needs to be within 0x4000 to OxaOOO. 

In such scenarios, the code should be generic enough to cater to the user 
specified min and max ranges of each variable that’s to be constrained. The 
above example can be updated to demonstrate how this can be achieved. The 
same tasks of write , read, and rmw will be used for illustration here. 
The only lines changed are explained in the following. 

rand_addr = (min_addr + ({$random} % 

(max_addr - min_addr) ) ) ; 

The changes to the earlier code are as follows: 

• The base address is set to be the min_addr variable. 

• The constr_range range was chosen to be between 0 and 
(max_addr - min_addr) . This itself becomes the oFlip-Flopset 
to the base address. 

• The {$random} now gives the positive values between the 0 and the 
rangeof (max_addr - min_addr) . 

The output with the above changes is as follows: 



Executing read transaction: addr = 
Executing idle for 3 clocks 
Executing read transaction: addr = 
Executing idle for 1 clocks 
Executing read transaction: addr = 
Executing idle for 4 clocks 
Executing write transaction: addr = 
Executing idle for 5 clocks 
Executing read transaction: addr = 
Executing idle for 0 clocks 
Executing read transaction: addr = 
Executing idle for 5 clocks 
Executing rmw transaction: addr = 
Executing idle for 0 clocks 
Executing write transaction: addr = 
Executing idle for 5 clocks 
Executing read transaction: addr = 
Executing idle for 0 clocks 
Executing read transaction: addr = 



7524, data = c0895e81 , be = 9 t=0 
998d, data = b2c28465, be = 2 t=5 
5176, data = 1e8dcd3d, be = d t=10 
64c6, data = e2f784c5, be = a t=15 
9612, data = 47ecdb8f, be = 2 1=20 
4ec5, data = 2e58495c, be = d t=25 
6263, data = 573870a, be = 0 t=30 
6c9d, data - cb203e96, be = 3 1=35 
7d6b, data = eaa62ad5, be = 2 t=40 
72cf, data = 11844923, be = a t=45 




146 



Verification 



Executing idle for 2 clocks 

Note that now the addresses are well within the desired range of 0x4000 
and OxaOOO. The above mechanism can be used to constrain the other 
variables, like the byte enables, data etc., for the stimulus generation. For 
example, the range for the number of clocks to be idle can also be 
constrained between min and max values. 

For the user interface, it is recommended to group the min and max 
values of the variables into a separate file. This can either be included using 
the 'include, or placed in a separate parameter file. That way, the user is 
only required to change the values within this file, and then run the 
simulation. One thing to also be considered in robust code is that, if no 
values are specified by the user, then some default values should apply to 
these min and max values of the variables. 

In SystemVerilog, the constraint construct in can be used to specify the 
bounded ranges for the random variables. 

3.4.4 How can I be sure that the constrained random stimulus has 
covered all the values in the range without repetition in a 
cyclic random fashion? Illustrate this with an example. 

One of the limitations in the above-constrained range is that, even though 
the transaction sequence was random, there was a high possibility that the 
same values could get repeated. This would still be meaningful, but 
unnecessary. The simulation session could be repeating the same scenario 
over and over again. This could add more simulation cycles to achieve the 
functional coverage goals. 

One way to resolve this would be to have a mechanism by which we 
could get a random sequence, but still unique so that we are able to hit all the 
values intended in one session, guaranteed. 

The following is a reusable behavioural code that will generate a unique 
value every time it is called. The only post processing that needs to be done 
is to type-cast this to a meaningful variable, whether it is a transaction type 
or address/data/byte-enable value. 

The example inputs the number of variables involved in the random 
stimulus range. Although the unique values returned are between 0 and 
num_values-l, this can act as an offset to a base value, if the range is 
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between two positive numbers. As required, the positive numbers can then 
by type-cast into a meaningful outcome of the transaction, as explained 
earlier FAQ 3.4.2. 

module get_uniq_id () ; 

// user defined number of variables, that is, range 
parameter num_values = 4 ; 

integer index_depth, i, j, rand_index; 

// define a 1 bit memory 

reg index_addr [num_values-l : 0] ; 
reg full; 
wire result; 
reg uniq_hit; 

// This task resets the memory to 0 

task reset_array; 

begin 

for (i=0;i< num_values; i = i + 1) begin 
index_addr [i] = 0; 
end 
end 

endtask // reset_array 

// This task scans thru the memory to see if the 
// variable has been requested earlier. If so, go to the 
// next random index, else return the index value 

task get_uniq_stim; 

output [31:0] index; 
begin 

uniq_b.it = 0; 

while (~uniq_hit) begin : search_uniq 
rand_index = {$random} % num_values; 

// $display ( "rand_index = %0d" , rand_index) ; 

if (index_addr [rand_index] == 0) begin 

index_addr [rand_index] =1; // set as hit 
uniq_hit = 1; 

disable search_uniq; // stop scanning 
end else begin 
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uniq_h.it = 0; // continue scanning 

end // else 
end // while 

index = rand_index; // return this unique value 
end / / begin 

endtask // get_uniq_stim 

// Checks to see if all the values have been 'hit' . If 
// so, return a 1, else 0 

task check_full; 

output full; 
begin 

full = 1; 

for ( i = 0 ; i<num_values ; i = i + l)begin 
full = full & index_addr [ i ] ; 
end // for 
end // begin 
endtask // check_full 

endmodule // get_uniq_stim 

// Following module instantiates the get_uniq_stim 
// module to get a unique number in each consecutive 
// random call 

module test_get_uniq_id (); 

// User defined range of values 
parameter num_values = 6; 
integer index_depth, j, rand_index; 
integer recvd_index; 
reg full; 

get_uniq_id # (num_values ) 

U_get_uniq_id ( ) ; // Instantiate the module 

initial begin 

U_get_uniq_id . reset_array ; 

$display ("num_values = %0d" , num_values) ; 
full = 0; 
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j = 0; 

// keep looping reactively until full range is done 
while (-full) begin 

U_get_uniq_id . get_uniq_stim ( recvd_index) ; 

$display ("Iteration %0d, Received unique index = 
%0d" , j , recvd_index) ; 

U_get_uniq_id . check_full (full) ; 
j = j +1; // For tracking purpose only 
end // while 

end // initial 

endmodule // get_uniq_stim 

The output of the above simulation run is: 

num_values = 6 

Iteration 0, Received unique index = 2 
Iteration 1 , Received unique index = 3 
Iteration 2, Received unique index = 1 
Iteration 3, Received unique index = 5 
Iteration 4, Received unique index = 4 
Iteration 5, Received unique index = 0 

The following are the salient points of the observations: 

• Notice that, for all the iteration values between 0 till (num_values- 1), 
all the values were unique. 

• When you replay, that is, simulate this code any number of times, the 
same sequence repeats although the sequence by itself is random in 
nature, that is, the sequence is pseudo-random. 

• The while loop will automatically stop when all the unique values have 
been ‘hit’, thus guaranteeing ‘touching’ all values of the variables within 
the range with the minimal number of random calls. This ensures that all 
the values have been cyclically touched. This improves the functional 
coverage efficiency, since fewer calls are required. 

• The usage scenario of this task can be further improved by encapsulating 
this into a module, and instantiating the same for other variables like 
data, byte enables etc. 
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These returned values can now be type-cast to either task calls, or as 
randomised inputs to other task calls, as illustrated in the earlier example. 

SystemVerilog has the randc constraint, which implements the above 
functionality of randomly cycling through all the values in a random 
permutation of their declared range. The same functionality above can be 
implemented in SystemVerilog with much fewer lines of code. 

3.4.5 How can I change the sequence of constrained random 
stimulus? Illustrate this with an example. 

Notice in the approach used in the previous FAQ, that the sequence is 
pseudo-random, that is, the sequence repeats exactly the same way, any 
number of times we run the session. This is because the $random system 
task has a default value of a seed within it. The examples earlier simply used 
the same default seed value. 

Suppose we wanted a different sequence other than the 2, 3, 1,5, 4,0 as 
obtained earlier, then how should the user influence this? The solution is to 
simply use a different seed value, explicitly specified by the user, so that the 
rest of the sequencing will get generated based on this seed. 

Rather than repeating the entire example again, if the following changes 
are done to the get_uniq_id module, then the module begins to return 
unique values based on a seed, specified by the user. 

• Add a provision of a parameter in the file, which the user can 
override, that is, 

parameter seed = 5; 

• Change the rand_index initialization without the seed, to the one 
with seed, that is, change the line 

rand_index = {$random} % num_values; 
to 

rand_index = {$ random ( seed) } % num_values; 

In the test_get_uniq_id, the provision for the user to override the 
seed value needs to be specified through a parameter override during 
instantiation, that is, 

• Change the line: 

get_uniq_id # (num_values) U_get_uniq_id ( ) ; 
to 
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get_uniq_id # (num_values , seed) 

U_get_uniq_id ( ) ; 

When the simulation is run with different seed values, the following 
are the sequences obtained: 



Table 3-13. The unique pseudo-random sequence seen for different seed 

values 



Seed = 10 


Seed = 20 


Seed - 30 


Seed = 40 


2 


0 


4 


2 


0 


5 


2 


3 


1 


3 


1 


0 


3 


4 


0 


5 


5 


2 


3 


1 


4 


1 


5 


4 



Notice that, with different seed values, the sequencing is fully different. 
Thus it can be mapped into different transactions for obtaining a different 
sequence of transactions for different seeds. 

3.4.6 What is weighted random stimulus? Illustrate this with an 
example. 

Note that, in the examples in the previous FAQ, the sequence of random 
numbers were indeed pseudo-random, that is, each random call produced a 
sequence deterministically. 

If this approach were used in a transaction based protocol scenario, then 
each random sample would provide a deterministic sequence of the 
transactions to follow. For example, in the read, write, and rmw 
example earlier, there would definitely be one occurrence of read or 
write or rmw command each time. 

Suppose there was a scenario in which the rmw command were to occur 
20% of time, the write command were to occur 30% of time, and the 
remaining 50% was for read transactions, how could this be achieved? 
Specifying this probability of occurrence would be useful, to reflect the real 
life scenario of transactions the DUT would be subjected to. 

The following paragraphs explains the above scenario, that is, in a given 
stimulus sequence of 10 samples, the read would occur only 5 times, the 
write only 3 times and the rmw only 2 times. Note that the actual sequence 
would be random among themselves, but still maintain the relative 
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probability of occurrence between them. This example implementation can 
be generalised or extended for more transaction types 

In the illustrations before, the random stimulus could be obtained in 
different ways, either unconstrained, or constrained for unique, or repeatable 
values. Suppose we have a requirement wherein a particular model was to 
returns different kind of responses, but each response has a particular 
weightage. For example, a particular slave model was to respond with 
probabilities of 40% Okay, 10 % Error, 30% Disconnect and 20% Abort 
responses, and these were to be in a random sequence. This kind of random 
stimulus, whose pattern occurs with a different probability based on the 
weightage in a given distribution, is called weighted random stimulus. 

The following example illustrates how the different percentages of the 
responses can be achieved in a very generic fashion. The background for this 
example is as follows: 

• The weight_rand module has an input get_nxt_response, 
which, when sampled, returns a response at the next clock. This 
implementation has four response types: Okay, Error, Disconnect 
and Abort. 

• For a given sequence of stimulus, the probability of the response 
type for a given response is a percentage of the weight for that 
response divided by the total weight of all responses. That means, 
the sum of the weights need not necessarily be 100. It could be 
Okay=10, Error=25, Disconnect = 30, and Abort = 15. In this case, 
the sum is not 100, but the percentage of Okay response is 
10/(10+25+30+15), that is, 1/8 (12.5%), the percentage of Error is 
25/80 (31.25%) and so on. 

The example code that implements the above is illustrated as follows: 

module resp __cntr (elk, update, clear) ; 

input elk, update, clear; 

reg full; 

parameter count = 10; // Can be overridden thru 

/ / the use of defparams 



integer entr; 
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always @ (posedge elk) begin 
if (clear) begin 
entr <= 0; 
full <= 0; 

end else if (-full & update) begin 
entr <= entr + 1; 
if (entr == count-1) 
full <= 1; 

end 

end 

endmodule // resp_cntr 

module weight_rand (elk, clear, get_nxt_response , 
response, full) ; 

input elk; 
input clear; 
input get_nxt_response ; 
output [1:0] response; 
output full; 

parameter abort_weight = 2; 
parameter discon_weight = 3; 
parameter okay_weight = 4; 
parameter error_weight = 1; 

integer i, range; 

reg [3:0] constr_rand; 

reg sim_done, continue_search; 

reg abort_update, discon_update; 

reg okay_update, err_update; 

reg auto_clear; 

reg [1:0] response; 

wire clear_cntr; 

wire full; 

'define abort 0 
'define disconnect 1 
'define okay 2 
'define error 3 




154 



Verification 



// There are 4 instances of the response_cntr 
// module, one for each of the response-types . 
resp_cntr # (abort_wei.ght) abort_cnt 

(elk, abort_update, clear_cntr) ; 
respentr # (discon_weight) discon_cnt 

(elk, disconupdate, clear_cntr) ; 
resp_cntr # (okay_weight) okay_cnt 

(elk, okay_update, clear_cntr) ; 
resp_cntr # (error_weight) error_cnt 

(elk, err_update, clear_cntr) ; 

assign clear_cntr = clear | auto_clear; 
assign full = (abort_cnt . full & discon_cnt . full & 
okay_cnt . f ull & error_cnt . full ) ; 



initial begin 

range = abort_weigh.t + discon_weight + 
okay_weight + error_weight ; 

$display ("range = %0d", range); 
abort_update = 1 1 bO ; 
discon_update = 1'bO; 
okay_update = 1'bO; 
err_update = 1'bO; 

end 

always @(posedge elk) begin 
abort_update <= 1'bO; 
discon_update <= 1'bO; 
okay_update <= 1'bO; 
err_update <= 1'bO; 
auto_clear <= 1'bO; 
if ( ge t_nxt_r e sponse ) begin 
continue_search = 1; 

if (abort_cnt . full & discon_cnt . f ull & 
okay_cnt . f ull & error_cnt . full) 
auto_clear <= l'bl; 
else begin 

while (continue_search) begin : next_uniq_hit 
constr_rand = {$random} % range; 
case (constr rand) 
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'abort : 

if (~abort_cnt . full) begin 
abort_update < = l'bl; 

$display ( "abort response t=%0d" , $time) ; 
response <= 2'b00; 
continue__search = 0,- 
end else 

continue__search = 1; 

'disconnect : 

if (-discontent . full) begin 
discon_update <= l'bl; 

$display ( "disconnect response 

t=%0d" , $time) ; 

response <= 2'b01; 
continue_search = 0; 
end else 

continue__search = 1; 

'okay : 

if (~okay_cnt . full) begin 
okay_update <= l'bl; 

$display ( "okay response t=%0d" , $time) ; 
response <= 2'blO; 
continue_search = 0; 
end else 

continue_search = 1; 

'error : 

if ( ~error_cnt . full ) begin 
err_update <= l'bl; 

$display ( "error response 

t=%0d" , $time) ; 

response <= 2'bll; 
continue_search = 0; 
end else 

continue_search = 1; 
endcase // case 
end // else 
end // while 
end // if 
end // always 



endmodule 
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The testbench that instantiates the above is illustrated below: 

module test_weight_rand { ) ; 

parameter width = 8; 

reg elk, clear, get_nxt_response ; 
reg done ; 

wire [1:0] response; 
integer i ; 
wire full; 

weight_rand U_weight_rand ( 

. elk (elk) , 

. clear (clear) , 

. get__nxt_response (get_nxt_response) , 

. response (response) , 

. full (full) 

) ; 

initial begin 
clear = 0; 

©(posedge elk); 
clear = 1; 

@ (posedge elk); 
clear = 0; 

i = 0; 

done = 0 ; 

while (~done) begin 

get_nxt_response <= l'bl; 

@ (posedge elk) ; 
if (get_nxt_response & -full) 
i = i + 1 ; 

get_nxt_response <= 0; 

@ (posedge elk) ; 

if (i > 0 && i % 10 == 0) 

$display (" Iteration of 10 samples done t=%0d\n" , $time 
if (i == 100) done = 1; 
end 

$f inish; 
end 
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initial begin 
elk = 0; 
forever begin 
# ( 5 ) elk = -elk; 
end 
end 

endmodule // test_weight_rand 

Running the above with weights of Abort=2, Disconnect=3, Okay=4, and 
Error=l, the outputs during different iterations are as captured in the 
following table: 



Table 3 - 14 . The different random responses with weig 


ltages 


Iteration 1 


Iteration 2 


Iteration 3 


Iteration 4 


Iteration 5 


Okay 


Disconnect 


Disconnect 


Abort 


Disconnect 


Disconnect 


Error 


Error 


Okay 


Error 


Okay 


Okay 


Abort 


Abort 


Okay 


Okay 


Disconnect 


Okay 


Okay 


Disconnect 


Abort 


Abort 


Disconnect 


Disconnect 


Okay 


Error 


Abort 


Disconnect 


Okay 


Disconnect 


Abort 


Disconnect 


Abort 


Disconnect 


Okay 


Disconnect 


Okay 


Okay 


Disconnect 


Okay 


Okay 


Okay 


Okay 


Okay 


Abort 




Okay 


Okay 


Error 


Abort 


0=4, D=2, 
E= 1 , A=2 


0=4, D=2, 
E=l, A=2 


0=4, D=2, 
E=l, A=2 







Some of the salient points regarding the above illustration are: 

• The above example illustrates four response types. This can, 
however, be modified easily for more/less response types. The 
probability of the response will be based on the weightage that the 
response type has in totality with the rest of the weights. 

• The example auto-clears itself when one set of response samples are 
done, where one set is the total of all the weights. The next iteration 
starts with a totally different stalling point, but still adds up to be the 
same as specified in the weights, in the next set of iterations. 

• The weightage in this illustration is static, that is, the weights are 
decided during the instantiation of the module. This can be made to 
dynamic, by making new ports as inputs to this module, which will 
specify the weightage, instead of parameter. Typically this is not 
modified until the current sets of transactions are completed. 
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• The simulation can be extended for many more samples are 
required, and the distribution is uniform as the time progresses. 

SystemVerilog has the dist construct, that implements the above 
functionality of weighted distribution of the random variables. 

3.4.7 What metrics help in defining the completeness of the 
random simulations? 

The random simulations keep producing different stimuli, based on the 
seed value to begin with. The completeness of the random simulation is a 
difficult question to answer. However, the following metrics help to improve 
the confidence level upon completion of the random simulations: 

1. Ensure that the constraints are different for each of the seeds to be run. 
The constraints must cover both breadth and depth in the simulation. 
The following is an example table to illustrate how the breadth and 
depth is reached for each feature of the product. Note that this example 
has only a very few examples of the constraint variables. These could be 
more in number for the specific project being worked. 



Table 3 - 15 . Illustration of breadth and depth in random constraints 



Constraints 


Comments 


Host side 


Application 

side 




Abort 


Split 


Okay 


Error 




0 


0 




0 


Normal termination, full data transfers 


0 


0 


0 


100 


App exception terminations, no data 
transfers 


100 


0 


0 


0 


Host exception terminations, partial data 
transfers 


50 


50 


100 


0 


Host Aborts and Splits 50% of transfers, 
App normal termination, full data transfers 


Many more such combinations 


Ensure different scenarios are all 
considered fully 



2. Make sure that the random simulations with the constraints decided have 
been run with different seed values. This will help to ensure that the 
trajectory taken by the random sequence generator is reasonably 
different each time, to activate different functional paths. 

3. It is recommended to run a set of seeds with the directed constraints as 
illustrated in the above table, and another set of seeds with the 
constraints also randomized. 
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4. If the DUT has parameters that influence the functionality of the 
implementation, then the above two steps need to be repeated for the 
different configurations with varied parameter combinations, too. This 
will ensure that the different sets of parameters that influence the DUT 
functionality also get verified. 

5. If there are dual clock FIFOs in the design that deal with data 
synchronization between the two different clock domains, you need to 
ensure running all the above with different clock frequencies. Typically 
this variance is a subset of the DUT parameter combinations. This will 
cause the FIFO flags, like full and empty, to change with different 
latencies, allowing any design assumptions that depend upon these 
latencies to be checked. If the frequency of both sides of the FIFO is 
assumed to have a relationship with any one side being faster than the 
other, then all the three ratios of clkl>clk2, clkl=clk2 and clkl<clk2 
frequencies should be run. 

6. Some approaches have a limit for the number of transactions 
successfully completed as the criterion. In that case, the limit on the 
number of transactions should be increased, until no problems/issues are 
observed until one or two weeks of wall clock time. While this step 
improves the confidence, it might still happen that the random 
sequencing has not hit a hidden issue yet. This approach does not give a 
measure of the functional space the simulation has covered and has a 
high possibility of repeating similar scenarios over and over again. 

7. If the verification environment has a functional coverage metric, then the 
random simulation needs to be run until these functional coverage 
metrics have been met. This approach is more measurable, and takes 
shorter time, than the above method of running for a long time. 

3.5 Stimulus generation 

In an ideal verification environment, any transaction scenario should be 
recreatable by the BFM. Sometimes, the logic to recreate a previously 
unanticipated stimulus sequence could either take a long time, or is too 
tedious to be implemented. This section discusses the different techniques of 
generating the stimulus for the DUT without use of any BFM, and how the 
same can be achieved, using the constructs in Verilog. Note that, eventually, 
it is important to have the recreation mechanism within the BFM for not 
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only controllability, but also as a part of permanent testcase recreation in the 
regression suite. 

3.5.1 What are some stimulus generation techniques when the 
stimulus is not reproducible using BFMs? Illustrate these 
with specific examples using Verilog. 

The following are a few techniques of generating the stimulus for the 
DUT. Each method has its pros and cons, and the method chosen has to be 
appropriate to the usage scenario. 

1. Vector replaying: This technique is sometimes useful to replay the 
stimulus from a file in the form of l’s and 0’s as an input to the 
DUT. This will be useful when the stimulus is unable/tedious to be 
reproduced using the conventional methods of commands of a BFM. 
An example of how this is used is illustrated as follows: 

module f ile_stimulus () ; 

parameter num_vecs = 5; 

integer i, in_vecfile; 

reg [16:0] vec_mem [num_vecs-l:0] ; 

reg [16:0] curr_data; 

wire [7:0] data_in, addr_in; 

wire wr_en_in; 

initial begin 

$readmemh (" inputs . vec" , vec_mem) ; 

for ( i = 0 ; i<num_vecs; i = i + 1) begin 
curr_data = vec_mem[i]; 

$display ( " i=%0d, \t wr_en = %b, data_in = %0h, 
addr_in = %0h" , i , curr_data [16] , curr_data [7 -. 0] , 
curr_data [15:8] ) ; 

// Do the actual assignments of the inputs wr_en, 

// addr and data to the DUT. The above $display was 
// for illustration/checking . 
end 



$f inish; 
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end // initial 

endmodule // f ile_stimulus 

The inputs . vec file contains the following: 

1_a1_d1 

0_a2_d2 

0_a3_d3 

0_a4_d4 

1_a5_d5 

The output of the above code produces: 

i=0, wr_en = 1 , data_in = dl , addrjn = al 
i=1, wr_en = 0, datajn = d2, addrjn = a2 
i=2, wr_en = 0, datajn = d3, addrjn = a3 
i=3, wr_en = 0, datajn = d4, addrjn = a4 
i=4, wr_en = 1 , datajn = d5, addrjn = a5 

To capture this vector file for replay, the mechanism indicated in the 
following code snippet can be used: 

'timescale lns/lps 

module vectorcapture () 

parameter num_vecs = 5; 

integer i, in_vecfile; 
reg [7:0] addr, data; 
reg wr_en; 
reg elk; 

initial begin 
elk = 0 ; 

forever elk = #5 -elk; 
end 

initial begin 

in_vecfile = $fopen ("inputs .vec" , "w"); 
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for (i=0; i<num_vecs+l ; i = i + 1) begin 
@(posedge elk); 

$fmonitor (in_vecf ile, "%b_%h_%h" , wr_en, data, 

addr) ; 

end 

$fclose (in_vecf i.'Le) ; 
end / / initial 

// This inital block emulates the capturing of the 
// vectors out of DUT 
initial begin 
@(posedge elk); 

wr_en = 1; addr = 8’hal; data = 8'hdl; 

@ (posedge elk) ; 

wr_en = 0; addr = 8'ha2; data = 8 ' hd2 ; 

@ (posedge elk); 

wr_en = 0 ; addr = 8 ' ha3 ; data = 8 ' hd3 ; 

@ (posedge elk); 

wr_en = 0; addr = 8'ha4; data = 8'hd4; 

@ (posedge elk); 

wr_en = 1; addr = 8'ha5; data = 8'hd5; 

@ (posedge elk) ; 

$f inish; 
end 

endmodule // vectorcapture 

Some of the critical factors to be considered during the vector replay 
mechanism are: 

• When there are inout ports in the port list of the vector playback 
mechanism, it is important to avoid a conflict/contention for the inout 
ports. It is useful to capture the state of the output-enable involving 
the inout ports during the vector replay. 

• Sometimes, the strengths of signals are also involved during the 
vector-playback. In that case, additional ports of information need to 
be communicated, based on the strengths at different instances of the 
simulation session. A gasket or wrapper needs to be written that will 
infer the strength values and drive the signal strength accordingly. 

2. Force/Release commands: 
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Sometimes, it is not possible to recreate a problem using the BFM 
infrastructure that the product has. One of the ways to recreate the problem 
is through the use of force and release commands in Verilog. These 
commands, as they indicate, will force the specified value to a net or 
variable until it is released. 

In the following example, the wire wl mimics the behavior of an 
incorrectly functioning internal net in a test environment which is always 
being driven to 0. The force command on that net was timed, so as to see its 
assertion of correct values, and then released back. 

module forcerelease; 

wire [1:0] ; 

assign wl = 0; 

initial begin 

#5 $display ( "wl = %0d, t = %0d",wl, $time) ; 
force wl = 2 ; 

#5 $display("wl = %0d, t = %0d",wl, $time) ; 
release wl; 

$display("wl = %0d, t = %0d",wl, $time) ; 
end 

endmodule // forcerelease 
The above code displays: 



wl =0, t = 5 
wl = 2, t = 10 
wl = 0, t = 10 



Some other salient features of the force and release mechanism are: 

• When the target variable of the force command is a wire, that is, the 
left hand side of a procedural continuous assignment, the target will 
get re-established back to a value being driven after the release 
command. Hence, wl is back to 0 after the release. 
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• When the target of the force command is a reg or other variable type, 
the target will retain the forced value after the release until another 
assignment is made to the variable. 

• The force and release commands can be implemented over internal 
nets, using hierarchical paths from the testbench onwards. In the 
above example, it could have been U1 . U2 .U3 .wl, instead of the 
top level net wl. 

• Although the above example illustrates wl as a procedural 
continuous assignment, using a wire, the same code works with wl 
being a procedural assignment, like a reg variable, that is, th e force 
command works on both reg and wire variables. One difference 
being that the display of wl in the third line would still retain the 
value of 2, since it is a procedural assignment, remembering what 
was assigned to it last until the next procedural assignment, as 
follows: 

wl =0, t = 5 
wl = 2, t = 10 
wl =2, t= 10 

• The force command cannot be applied on variables within an 
automatic task, since they get de-allocated at the end of task 
execution. It can, however, be used on variables within a static task. 

3.6 Gate level simulations 

This section discusses a few questions that arise prior to, and during gate 
level simulations. A gate level simulation consists of replacing the RTL 
DUT in the testbench with a synthesized netlist. The gate level simulations 
are typically run before signing off the netlist for fabrication. The gate level 
simulations are typically longer in time, compared to the RTL level 
simulations, due to more nodes and events toggling in the netlist. 

3.6.1 What is SDF back-annotation, and how is it implemented in 
Verilog testbench? 

SDF annotation is the process to annotate timing information from a 
Standard Delay Format (SDF) file into the design using the Verilog 
$sdf_annotate system task. The SDF file contains the timing values for 
specify path delays, specparam values, timing check constraints, and 
interconnect delays. These timing values are usually based on the ASIC 
vendors’ characterization of their technology library. 
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An example of an SDF backannotation is: 
initial begin 

$sdf_annotate ("DUT. sdf" , U_top, , , MINIMUM, , ); 
end 

Some of the characteristics of the sdfbackannotation implementation 
are: 

• Since the SDF annotation is an ordered process, that is, the 
contents of the SDF file are annotated in the order they are 
specified, the SDF constructs can be overridden by specification 
from a later annotation. 

• More than one SDF file can be annotated. Each call to the 
$sdf_annotate annotates the timing information from the SDF 
file specified in the command. 

• Different regions of a design can be annotated from different 
SDF files by specifying the regions’ hierarchy scope as the 
second argument to the $sdf_annotate command. In the example 
above, the topmost module has been annotated. It could, 
however, be specified for the individual hierarchies beneath, as 
U_top.Ul, U_top.U2, etc. It is then necessary to have the 
individual SDF files for each of these hierarchies. 

• The scale factor for the minimum, typical, and maximum values 
are 1.0: 1.0: 1.0 by default. This can be changed in the 
$sdf_annotate command, to modify the min:typ:max constants. 

3.6.2 What are a few pre-requisites before running gate level 
simulations? 

Before running any gate level simulations, the following are a few points 
to be verified, to avoid the iterations and issues seen during gate simulations: 

1. Ensure that the synthesis constraints were subjected to the extremes of 
the operating conditions. This is to avoid setup and hold violations. A 
clean run on Static Timing Analysis (STA) should be a pre-requisite 
before running a full timing gate simulation. 

2. Ensure that the assertion and de-assertion of all inputs to the DUT netlist 
in the testbench reflect the constraints to which the synthesis was done. 
This includes the reset signal, too. Unrealistic input delays to the DUT 
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will cause excessive delays to the timing path, and hence, cause 
functional mismatches during simulations. 

3. It may be necessary to annotate the SDF delays of the design into the 
simulation. Typically, the SDF file is created out of the synthesis tool 
that generated the netlist, too. The SDF file contains the critical timing 
information of the design. This timing information will maintain the 
timing requirements of setup and hold delays during simulation. 

4. The operating condition under which the gate simulation should be done 
can be specified in the SDF file. This can also be tuned to reflect much 
closer to the reality. 

5. Ensure that all the inputs to the netlist are initialized to a known value 
(sometimes required to be initialized at time=0). This will avoid 
unintentional unknown (X) propagation through the netlist. 

3.6.3 What is the difference between unit delay and full timing 
simulations? 

Unit delay simulation is typically done either during the RTL or gate 
level simulations. Unit delay mode is used to debug basic design 
functionality. When simulated in this mode, all the delay specifications on 
gates, switches and continuous assignments are set to a delay of 1 simulator 
time unit. Unit delay simulation is enabled by the "delay_mode_unit 
compiler directive. Many simulators can also enable unit delay simulation, 
using the +delay_mode_unit invocation switch. 

It is important to note that unit delay simulation is not a paid of the 
Verilog standard. While most Verilog simulators do support unit delay 
simulation capabilities, it might not lead to identical results in all simulators. 

The following example illustrates the ' de 1 ay_mode_uni t directive: 

module test_unit(); 

integer a; 
wire [31:0] b; 

initial begin 
a = 0 ; 

#4 a = 3 ; 
end 



assign #5 b = a; 
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initial begin 

$monitor ("a = %0d, b = %0d, t=%Od",a, b, $time) ; 
end 

endmodule 

When the above code is simulated without the 'del ay_mode_uni t 
directive, the output from $monitor is as follows: 

a = 0, b = x, t=0 
a = 3, b = x, t=4 
a = 3, b = 3, t=9 

With the 'delay_mode_unit directive, the output is as follows: 

a = 0, b = x, t=0 
a = 0, b = 0, t=1 
a = 3, b = 0, t=4 
a = 3, b = 3, t=5 

The key points to note here are: 

• The final assignment to the variable b happens at time 5 instead of 
9. 

• Only the delay in the continuous assignment has been collapsed to 1 
timeunit, whereas the procedural assignment delay of #4 has been 
maintained. 

• The unit delay simulation typically helps to check for any race 
conditions between the signal transitions with respect to the clock. 

• Unit delay mode does not do a good job of reflecting potential timing 
problems in real hardware like setup or hold problems. 

• The unit delay simulation is slower than RTL, but faster than full 
timing gate simulation, since there are fewer events to be resolved, 
compared to full timing gate simulation. 

Full timing gate simulation is carried out in the full timing mode of the 
simulator. This mode simulates the effects of timing on the design logic. It 
uses the delay equations specified in the technology file. The actual pin-to- 
pin delays within the cells literally constitute the delay of the timing path. A 
sanity run of full timing gate simulation is useful because: 
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• It checks for the critical timing parameters, like setup and hold times for 

storage elements 

• It verifies any discrepancies between the simulation and synthesis 

models of the cells. 

The issue with full timing gate simulation is the run time. It will run 
longer than the RTL or unit delay simulation, since there are lot more events 
to be tracked by the simulator. The trend now is to do a combination of 
formal verification between the RTL and the gate netlist and the static timing 
analysis of the netlist. 

3.6.4 My gate simulation is not passing, and some tests hang. What 
are the key points to look for? 

Gate simulations are sometimes run to see any timing effects that exist 
due to the min/max operating conditions. There are some times when the 
gate simulations do not functionally work equivalent to the RTL simulation. 
Some of the reasons are: 

1. Variables are not specified correctly in the RTL sensitivity list. 

2. Conservative modeling in the cells may introduce Xs in the outputs, 
especially in the scenarios of setup and hold violations. This would 
cause a chain of events that would make the simulation hang. 

3. Incorrect constraint used during synthesis may cause the delay using 
the netlist to actually cross a clock period, and cause functionally 
incorrect behavior. 

4. A functionally incorrect response, due to bad logic in the netlist 

The net result of the above factors is that the DUT during simulation 
doesn’t respond in a functionally correct manner. The following are a few 
tips to be considered during gate simulation problems: 

1. Run the RTL through linting tools, to make sure that there aren’t any 
sensitivity list problems, and analyze the report of the linting tool 
carefully, to see there are no serious issues. 

2. During the synthesis compilation, look at the elaboration and 
compilation log file to see if there were any unusual warning / error 
messages from the synthesis tool about unconnected ports or floating 
logic being optimized away. 

3. Ensure that the constraints used during the synthesis actually reflect 
the best and worst case operating conditions, clock uncertainty, clock 
latency, wireload models, etc. Basically, the constraints have to reflect 
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close to the real life operating condition of the chip for which the 
netlist is being verified. If these constraints are not met, it is not 
recommended to proceed to full timing gate simulation. 

4. If the static timing analysis (STA) in the above met the constraints 
without any violations, the functional issue can be isolated as either a 
timing or, a bad logic problem. The timing issue can be resolved by 
normalizing the delays within the individual gates into one timescale 
tick through the unit delay simulation. Many simulators support the 
'delay_mode_unit directive, or the +delay_mode_unit switch for 
unit delay simulation. 

5. Even with unit delay simulation, if the problem persists, then it could 
either be a functionally incorrect model of the cell, or a bad logic 
problem during synthesis. Functional problems in the cells could 
result, due to incorrect modeling of the User Defined Primitives 
(UDPs) in the cell library, causing race conditions. Running a formal 
comparison between the RTL and the netlist can isolate the compare 
points where there are failures. 

SUMMARY 

This chapter discussed details about the topics of verification. The 
chapter began with implementation of messaging, which is the mechanism 
of communication to the users. The next topic was the design of monitors 
and their usefulness. An example monitor code was also provided. BFMs 
were discussed next, which form the substance of the test environment. The 
last topic was stimulus generation through BFMs either by directed or 
random fashion. Examples were discussed for the different illustrations. 




Chapter 4 

MISCELLANEOUS 



INTRODUCTION 

This chapter lists various questions that may come up during the course 
of using the Verilog HDL. These FAQs are not in any particular order or 
category. 

4.1.1 What is the difference between a vectored and a scalared net? 

Both scalared and vectored are Verilog constructs used on multi-bit nets 
to specify whether or not specifying bit and part select of the nets is 
permitted. For example, 

module test_scalared_vectored; 

wire scalared [3:0] wirel; 
wire vectored [3:0] wire2; 

wire bitl, bit2 ; 

// syntax error to use bit select of a vectored net 
assign wire2 [1] = 1'bO; 

/ / okay to use bit select of a scalared net 
assign wirel [2] = l'bl; 



/ / syntax error to use bit selects of a vectored net 
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assign bitl = wire2 [1] ; 

/ / okay to use bit selects of a scalared net 
assign bit2 = wirel [2] ; // scalared net 

endmodule // test_vectored_scalared 

4.1.2 What is the difference between assign-deassign and force- 
release? 

The assign-deassign and force-release constructs in Verilog have similar 
effects, but differ in the fact that force-release can be applicable to nets and 
variables, whereas assign-deassign is applicable only to variables. 

The procedural assign-deassign construct is intended to be used for 
modelling hardware behaviour, but the construct is not synthesizable by 
most logic synthesis tools. The force-release construct is intended for 
design verification, and is not synthesizable. 

4.1.3 What is the order of precedence when both assign-deassign 
and force-release are used on the same variable? 

The force statement overrides the value of assign statement until it is 
released. The following example illustrates the same: 

module f orcerelease ; 



reg [1:0] wl ; 

initial begin 
$display("l wl 

assign wl = 1; 
#5 $display("2 

force wl = 2; 
#5 $display("3 

release wl; 

#5 $display("4 

deassign wl; 

#5 $display("5 



= %0d, t = %0d",wl, $time) ; 
wl = %0d, t = %0d",wl, $time) ; 

wl = %0d, t = %0d",wl, $time) ; 

wl = %0d, t = %0d",wl, $time) ; 

wl = %0d, t = %0d",wl, $time) 
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end 

endmodule 

The above code produces the following output: 

1 wl = x, t = 0 

2 wl = 1, t = 5 

3 wl = 2, t = 10 

4 wl = 1, t = 15 

5 wl = 1, t = 20 

As evident from the above, the force command has overridden the 
assigned value earlier and relinquished it back to its assigned value after the 
release command. 

4.1.4 How can I abort execution of a task or a block of code? 

The Verilog disable statement will be able to abort the execution of a 
task or block of code. Disabling a block of code would be useful in scenarios 
like: 

• Executing a “break” command within a loop, to skip the rest of them 
loop iterations, and exit the loop 

• Terminating a task before its completion 

Note that the disable statement is used with a block name. For example, 

initial begin : blockl 
begin : block2 
statementl ; 

//etc . 

disable block2 ; 
statements ; 
statements ; 
end //of block2 
statement7 ; 
end // of blockl 

In the above example, “blockl” and “block2” are the block names. As 
the statements get executed, when the disable statement is hit, the remaining 
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statements in block2, that is, statements 5 and 6 don’t get executed. They are 
skipped, and the execution resumes from statement 7. 

The only restriction on using the disable statement is that it cannot be 
used in a function, as it would invalidate the function and its return value. It 
can, however, be used in a task. 

SystemVerilog also introduces the break command to exit the loop. This 
is more graceful than the disable statement as illustrated in the following 
example: 

module test_break; 
integer i; 
initial begin 
i = 10; 

while (i) begin 

i-- ; 

if (i == 5) begin 

break ; 

end else 

$display("i = %0d",i); 
end // while 
end // initial 

endmodule // test_break 

If the current iteration needs to be skipped on certain conditions, 
SystemVerilog has added a command continue which will directly jump to 
the end of the loop. For example, in the following code, the loop would be 
skipped for all odd values of the variable i. 

module test_continue ; 

integer i; 

initial begin 
i = 10; 

while (i) begin 

i-- ; 

if (i % 2) begin 

// $display ( " iodd = \t%0d",i); 

continue ; 
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end else 

$display ( " ieven = %Od",i); 
end // while 
end // initial 

endmodule // test continue 

The output of this test program displays: 

ieven = 8 
ieven = 6 
ieven = 4 
ieven = 2 
ieven = 0 

4.1.5 What are the differences between the looping constructs 
forever, repeat, while, for, and do-while ? 

The statements forever, repeat, while, and for are the looping statements 
supported in Verilog-2001 and the do-while construct is introduced in 
SystemVerilog. These statements fundamentally differ in how many times 
the statements within the begin-end scope of the loop is executed. The 
following bullets summarize these differences: 

• forever : Executes the statements within its begin-end block forever , 
without any variable to control it until the simulation session 
terminates. For example,: 

initial begin 
elk = 1; 

forever begin : clk_block 
# (clk_period/2) elk = -elk; 
end 
end 

Note that a forever loop cannot be terminated via a disable statement. 

• repeat: Executes statements within its begin-end block a fixed 
number of times that is evaluated once at the beginning of the loop. 
For example: 



integer varl, i ; 
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initial begin 
varl = 8 ; 
i = 0; 

repeat (varl) begin : this_loop 
i = i + 1 ; 

$display("i = %0d",i); 
end 

$f inish; 
end 

Note above that the varl has to be within brackets, as (varl) . 
Without the brackets, it is a syntax error. Since the variable size is a 
constant that needs to be fixed a priori before entering the repeat 
loop, the possibility of an infinite loop through the repeat construct 
doesn’t occur. The disable statement can be used to exit the loop 
prematurely. 

while: Executes the statements within its begin-end block 

indefinitely, until its expression becomes false. The loop expression 
will also evaluate to false if it has a X or Z value in it. For example, 

integer i; 
initial begin 
i = 8; 

while (i) begin : this_loop 

i = i - 1; 

$dis.play("i = %0d",i); 
end 

$f inish; 
end 

The above code can potentially end up being an infinite loop, if there 
is no statement to falsify the expression of the while loop. The 
disable statement can be used to exit the loop prematurely. 

for : Executes the statements within its begin-end block, based on 
the number of times its variable is modified, in steps, until the 
variable evaluates to X or Z or false. 

integer i ; 
initial begin 

for ( i = 0 ; i < 8 ;■ i = i + 1) begin : loopl 
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$display("i = %Od",i); 

// i = i + 1; 
end 

$f inish; 
end 

The above code displays the values of i as 0,1, 2, 3, 4, 5, 6, 7. Note that 
for loop also has a potential of entering into an infinite loop, if the 
expressions don’t get falsified over a period of time. The disable 
statement can be used to exit the loop prematurely. 

Unlike the repeat loop, the loop variable can be manipulated within 
the for loop. For example, in the above code, if the statement “i = i + 
1” within the begin-end block is uncommented, then the display will 
be of values 0,2, 4, 6. This capability could, if used incorrectly, also 
be a cause for entering an infinite loop. Thus, modifying the loop 
variable in a for loop is not a best practice, and should be strongly 
discouraged. 

Another unique feature of for loop is that it is the only looping 
construct supported by the many synthesis tools. The statements 
within the for loop are replicated, once for each value of the looping 
index. For this reason, the bounds of for loop need to be fully 
deterministic when the code is read by a logic synthesis tool. 

• do-while: Executes the statements within its begin-end block, until 
the variable within the while statement evaluates to X or Z or false. 
The expression is evaluated at the end of the loop. 

module test_dowhile ; 
integer i ; 

initial 

begin 

i = 4 ' d2 ; 
do 

begin 

i++; 

end 

while (i <= 4 ' dl5) ; 

$display ("i = %0d",i); // displays i = 16 
$ finish; 
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end 

endmodule // test_d.owhile 

The key advantage of the do-while statement is that it guarantees the 
execution of the loop statements at least once before reaching the 
end of the loop. Hence, this avoids the duplication of the loop body 
outside the start of the loop before checking the entry of a normal 
while loop. 

4.1.6 What is the difference between based and unbased numbers? 

Based numbers are those which have a base identifier preceding the 
actual number. For example, 4 ' habcd represents a number with a 
hexadecimal base. An unbased number has no base specified before it, and 
represents a simple integer. For example, the integer 23 is an unbased 
number, since it has no base specification preceding it. 

4.1.7 What does it mean to “short-circuit” the evaluation of an 
expression? 

Verilog supports numerous operators that have rules of associativity and 
precedence. In some of the expressions, the result of the expression can be 
evaluated early on, due to the precedence and influence to override the rest 
of the expression. In that case, the entire expression need not be evaluated. 
This is called short-circuiting and expression evaluation. 

For example: 

input ini, in2 , in3, in4 ; 
wire wirel, wire2; 

assign wire2 = (ini > in2) & (in3 | in4) ; 

In the expression above, the result of the test (ini > in2) is ANDed 
with the result of (in3 | in4 ). If the result of (ini > in2) is false 
(1 'bO), then tools can already determine that the result of the AND 
operation will be 0. Thus, there is no need to evaluate (in3 | in4) and rest 
of the equation is short-circuited. 
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4.1.8 What is the difference between the logical (==) and the case 
(===) equality operators? 

The “==” operator specifies logical equality and the “===” equality 
represents the case equality. The “ = = ” logical equality operator is used to 
model hardware, where comparisons to high-impedance or unknown values 
would yield ambiguous results (an unknown or X in simulation). In other 
words, if any of the operands of the “==” operator contain X or Z, then the 
result is ambiguous and is an “X”. For example, 

a = 2'blx; 
b = 2'blx; 

if (a == b) 

$display ("reached if"); 
else 

$display ("reached else"); 

Since at least one of the operands, a, contains X in one of its bits, the 
result is X and in this case, the message “reached else” is displayed. Note 
that in this example, even though it appears that both a and b appear to be 
equal, the presence of a X in either will result in a mismatch during the 
comparison operation. 

The “= = = ” case equality operator is intended for verification, where it 
is important to test if a value is high-impedance or unknown. If any of the 
operands of “===” contain X or Z bits, their comparison is still considered 
during evaluation and a Boolean result is reached, that is, the result is a 
1 'bl or a 1 'bO. For example, 

a = 2'blx; 
b = 2'blx; 

if (a === b) 

$display ("reached if"); 
else 

$display ("reached else"); 

In the above, “reached if’ is displayed. The example works the same if X 
is replaced with Z. 
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4.1.9 What are the differences and similarities between the logical 
(«, ») and the arithmetic (<«, »>) shift operators? 

The logical shift operators are (« and »). The logical shift operator has 
been present from Verilog-1995. The arithmetic shift operators are («< and 
»>), which were introduced with Verilog-2001. 

• Three of them, that is, logical left shift («), arithmetic left shift 
(«<) and logical shift right(») operators, shift the bits left/right by 
the number of bit positions specified by the right operand, and the 
vacated bits are filled with zeros. 

• The arithmetic right shift operator (»>) will fill the vacated bits 
with 0 if the left operand is unsigned, and the most significant bit if 
the left operand is signed. 

The following example illustrates all the above facts: 
module test; 

reg [7 : 0] tmpl, tmp2 ; // default unsigned 
reg signed [7 : 0] tmp3, tmp4 ; // signed 

initial begin 

tmpl = 8 ' bOOOOHOO ; 

tmp2 = tmpl << 4; // logical unsigned shift left 
$display ( "tmp2 = %b",tmp2); // tmp2 = 11000000 

//arithmetic unsigned shift left 
tmp2 = tmpl <<< 4; 

$display ( "tmp2 = %b",tmp2); // tmp2 = 11000000; 

tmpl = 8 ' blOOOHOO ; 

// logical unsigned shift right 
tmp2 = tmpl >> 2; 

$display ( " tmp2 = %b",tmp2); // tmp2 = 00100011 

// arithmetic unsigned shift right 
tmp2 = tmpl >>> 2; 

$display ( " tmp2 = %b",tmp2); // tmp2 = 00100011 
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tmp3 = 8 'blOOOHOO; 

// arithmetic signed shift right 
tmp4 = tmp3 >>> 2; 

$display ( "tmp4 = %b",tmp4); // tmp4 = 11100011 
// Note that the msb "1" got filled in all 
// vacated bit ' s 

end 

endmodule // test 

4.1.10 What is the difference between a constant part-select and an 
indexed part-select of a vectored net? 

The constant part-select and indexed part-select are two types of 
addressing the contiguous bits of a vectored net/reg, or any multi-bit variable 
declaration. 

The constant part select, as the name suggests, has a constant definition 
for its upper and lower bounds. For example, 

reg [msb : lsb] 

where both msb and 1 sb must be constant expressions, that is, they have 
fixed values during compile time itself. 

In the case of indexed part select, as the name suggests, the width of the 
part select (the right operand) must be constant, but the starting or ending 
point of the part select (the left operand)can vary. For example,: 

module test; 

reg [7 : 0] tmpl; // descending order 
reg [0 : 15] tmp2 ; // ascending order 
integer i; 
initial begin 
i = 0; 

tmpl[i + : 8 ] = 8'hab; // assigns to tmpl [7:0] 

$display ( "tmpl [7 : 0] = %0h",tmpl[0 +: 8] ) ; 

= 6 ; 



l 
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tmpl[i - : 4 ] = 4 'ha; // assigns to tmpl[6:3] 
$display ( "tmpl [6 : 3] = %0h",tmpl[6 4] ) ; 

i = 0; 

tmp2 [i +: 8] = 8'hef; // assigns to tmp2 [0 : 7] 
$display ( "tmp2 [0 : 7] = %0h",tmp2[0 +: 8] ) ; 
i = 15; 

tmp2 [i - : 8] = 8'hcd; // assigns to tmp2 [8 : 15] 
$display ( "tmp2 [8 : 15] = %0h",tmp2[15 8] ) ; 

end 



endmodule 

Note the and syntax in the above usage. The + : indicates that 
the part select bit numbers will incrementfrom the value of the left operand 
up to the width specified by the right operand (which must be constant). The 
- : indicates that the part select of the bit numbers will decrementfrom the 
value of the left operand up to the width specified by the right operand 
(which must be constant). For the purpose of understanding this better, the 
general expressions used for analysis are: 

variable[base +: offset] and 
variable [base offset] 

to arrive at the variable [physical_msb: physical_lsb]. 

The following table summarizes the context and usage scenarios: 



Table 4 - 16 . Interpretation of physical MSB and LSB for indexed part select 





:«5MU.TETT>ETrtl?K fa. 


Physical LSB 


In the case of descenc 


ing index like reg [7:01 tmpl; 


Base +: offset 


offset - 1 


base 


Base offset 


base 


offset - 1 


In the case of ascending index like reg[0:151 tmp2; 


Base +: offset 


base 


offset - 1 


Base offset 


offset 


base 



4.1.11 Illustrate how memory indirection is achieved in Verilog. 

Indirection is a mechanism where a pointer is passed as a value of an 
argument to memory. This is very commonly used in software. In Verilog, 
the address of a memory location can be specified as an expression, too. One 
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way to specify this is in the form of the value of another location in the same 
memory. For example, in the following: 

new_value = my_memory [my_memory [10] ] 

suppose the value of my_memory [10] = 16’hlaOO, then the result of 
new_value is as good as specifying as my_memory[16’hla00]. Hence, 
the memory indirection can be achieved. 

4.1.12 What is the logic synthesized when a non-constant is used as 
an index in a bit-select? 

A multiplexor is synthesized when a non-constant is used as an index in a 
bit-select. The following is an example: 

module indexed_mux (data_in, select, data_out) ; 
input [7:0] data_in; 
input [2:0] select; 
output data_out; 

assign data_out = data_in [select] ; 
endmodule / / indexed_mux 

The above RTL code will get synthesized into a multiplexor with 8 bits 
of data_in and 3 bits of select, as follows. Typically, the synthesis 
tools will try to pick up an 8:1 multiplexor from the target library. If it is not 
available, the multiplexor gets synthesized, using logic gates. 



data in 


d[7:0] 






z 


select 


sel[2:0] 





data out 



8:1 mux 



Figure 4-1. A multiplexor generated out of non-constant bit select 
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4.1.13 How are string operands stored as constant numbers in a reg 
variable? 

Strings are stored as ASCII characters in 8 bit fields. For example, the 
ASCII characters for lower case a-z are 8'h61 to 8’h7a. Hence, the reg 
declaration that uses these fields needs to accommodate the correct number 
of bits with 8 bits for each character. 

Since the ascii characters are stored as 8 bit fields, modifying these bits 
would change the value being displayed. The following example illustrates 
this. 

module test; 

reg [5*8 -1 : 0] tmpl; //5 chars, 8 bits each 
initial begin 

//assigns ASCII of each character 
tmpl = "hello"; 

//displays 1 h68_65_6c_Sc_6f 
$display ( "tmpl = %0h",tmpl); 

// displays hello 
$display ( "tmpl = %s",tmpl); 

// represents ASCII "y" 
tmpl [4*8-1 : 3*8] = 8 1 h7 9 ; 

// displays hyllo 
$display ( "tmpl = %s",tmpl); 

end 

endmodule 

SystemVerilog allows the definition of variables as string. This is a very 
flexible mechanism of not only initialising these variables with a 
dynamically allocated array of characters, but also includes a set of 
associated functions which return the length of the string, character 
manipulation, case conversion etc. 
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4.1.14 How can I typecast an expression to control its sign? 

The sign of an expression can be controlled by typecasting with two 
system functions namely $signed and $unsigned. These functions evaluate 
the expression to return the type of sign as requested. For example, 

module test; 



reg [3 : 0] regU; //default unsigned reg variable 
reg signed [3 : 0] regS; // signed reg variable 



initial begin 

regU = -2; // stored as unsigned value 
regS = -5; // stored as signed value 

// Unsigned math 

$display ("regU+regS = %0d", (regU+regS) ) ; 



// Signed math 

$display ( "\$signed (regU) +regS = 
($signed (regU) +regS) ) ; 



end 



%0d" , 



endmodule 

As illustrated above, casting is very beneficial in the middle of 
compound operations. 

4.1.15 What are the pros and cons of using hierarchical names to 
refer to Verilog objects? 

Verilog allows the access of variables by using hierarchical paths. For 
example, the status net at the top level can be assigned directly with the 
value, as seen in the hierarchy underneath: 

assign status = top .DUT ,U_core .U_CSM . status ; 

The advantage of using hierarchical names is: 

• It is easy to debug the internal signals of a design, especially if they 
are not a part of the top level pinout. 
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The disadvantages of using hierarchical names are: 

• Sometimes, during synthesis, these hierarchical names get 
ungrouped or dissolved or renamed, depending upon the synthesis 
strategy and switches used, and hence, will cease to exist. In that 
case, special switches need to be added to the synthesis compiler 
commands, in order to maintain the hierarchical naming. 

• If the Verilog code needs to be translated into VHDL, the 
hierarchical names are not translatable. 

4.1.16 Does Verilog support an (a b ) operator? 

Yes. Verilog supports the a b operation by using two astrices, back to 
back. This operator was added with the Verilog-2001 release. For example, 

module powerof (ini, outl); 
parameter power = 2; 
input [1 : 0] ini; 
output [3 : 0] outl; 

assign outl = ini ** power; 

endmodule // powerof 

A value of 2 would mean outl = ini * ini, that is, the value getting 
multiplied to itself. Simulation, however, works for powers other than 2, as 
well. 

4.1.17 What is the main limitation of fork- join in Verilog, and how 
is this overcome in SystemVerilog? 

The main limitation of fork-join construct in Verilog is that it is static, 
that is, the execution of the code beyond the join is suspended until all the 
processes within the fork-join are completed. For example, in the following 
code, the last $display statement gets executed only after 10 time units, 
although the process 1 is completed in 5 time units: 

module fork_join_tests ; 



integer out_val ; 
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initial begin 
fork 

begin // First process 
#5 ^display ( "exit first process at t = %0d" , 

$time) ; 

end 

begin // Second process 
#10 $display ( "exit second process at t = %0d" , 

$time) ; 

end 
j oin 

$display ( "exit fork join at t = %0d" , $time) ; 
end 

endmodule // fork_join_tests 

The above code produces the following display outputs: 

exit first process at t = 5 
exit second process at t = 10 
exit fork join at t = 10 

SystemVerilog adds two new keywords for joining parallel processes: 
joinjany and joinjione. 

When the join in the code above is replaced by joinjany, then the 
following display outputs are produced: 

exit first process at t = 5 
exit second process at t = 10 
exit fork join at t = 5 

Notice that the fork-join _any exits after the first process gets completed, 
that is, at 5 time units. 

When the join in is replaced by joinjione, then the following display 
outputs are produced: 

exit first process at t = 5 
exit second process at t = 1 0 
exit fork join at t = 0 
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Notice that the fork-join _none exits after spawning both the processes 
and not waiting for any of them to be completed, that is, exits at at 0 time 
units. 

The joinjmy and joinjione constructs of SystemVerilog do not hold the 
fork process until all of its process are necessarily completed. 

4.1.18 Can I return from a function without having it disabled? 

It is illegal to disable a function in Verilog-1995 and 2001. However, 
SystemVerilog has introduced a keyword return, that skips the rest of the 
lines of the function, and returns back to the block that called the function. 
For example, in the following code, the function would return back, if ini 
is greater than in2. 

function [31:0] my_func; 
input [31:0] ini; 
input [31:0] in2 ; 
reg tmp_reg; 
if (ini > in2) begin 

$display ( "my_func returning back"); 
return; 
end else 

my_f unc = ini + in2 ; 
endfunction 

module func_multibit ; 

reg [31:0] result; 

initial begin 

result = my_func(3, 4); 

$display ( "result = %0d" , result ) ; 
result = my_func(4, 3); 
end 

endmodule // func_multibit 

The above code would produce the output displays of: 

result = 7 

my_func returning back 
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4.1.19 What is strobing? How do I selectively strobe a net? 

Strobing is a facility defined in Verilog by which simulation data on 
selected nets or variables can be captured at the end of the current simulation 
time instant, after all the events scheduled for this time have occurred, and 
just before the simulation time is advanced. In Verilog, strobing is facilitated 
by the $strobe system call. Syntactically, this system call is very similar to 
the $display system call. An example of the $strobe system call follows: 

always @(negedge system_clock) 

$strobe ("Time = %0d, rx_active = %b rx_data = %h" , 
$time, rx_active, rx_data [7 : 0] ) ; 

Functionally, the $strobe system call creates internal monitoring events 
of its arguments, which are re-enabled at every user-specified time-step. 

Variants of $strobe include $strobeh (hexadecimal formatted), $strobeo 
(octal formatted), $strobeb (binary formatted). All of these system calls 
print their results on the standard output device. For printing the strobed 
outputs to a specific file, instead of the standard output, there exist the file- 
specific variants of these system calls: $f strobe, $fstrobeb, $fstr obeli, and 
$fstrobeo. The syntax of these calls takes on an additional argument for the 
file-handle. For example: 

integer file_handle; 

initial 

begin 

file_handle = $f open ( "vectors . stb" ) ; 
end 

always @(negedge system_clock) 
begin 

$f strobe (f ile_handle, 

"Time = %0d, rx_active = %b rx_data=%h" , 

$time, rx_active, rx_data [7 : 0] ) ; 

end 

In addition to $strobe, Verilog has an additional system call, called 
$monitor, which is used for taking snapshots of signal changes during 
simulation. Like $strobe, the $monitor system call also creates internal 
monitoring-events of its arguments. However, unlike $strobe, a $monitor 
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call cannot create other simulation events. For example, we can create a 
time-event (that is, simulation event) for a $strobe: 

always @ (posedge clock) 

$strobe ( ...arguments...) ; 

In this example, the values of the argument signals of $strobe are printed 
out to the standard output at every posedge of the clock signal. 

However, $monitor effectively, “stands by itself’. Example: 

initial 

begin 

$monitor ( 

"Time = %0d: rx_active = %b, rx_data = %h" , 

$time, rx_active, rx_data [7 : 0] ) ; 

end 

In this example, every change on the rx_active and rx_data 
signals causes $monitor to print out the changes to the standard output 
device. 

From this example, it is clear that a $monitor call for a given set of 
arguments should be issued only once in a simulation. If $monitor is called 
more than once, then the most recent invocation overrides all previous calls. 
In the following example, on the signals tx_valid and tx_data will be 
monitored: 

initial 

begin 

// monitor for receive activity 
$monitor ( 

"Time = %0d: rx_active = %b, rx__data = $h" , 

$time , rx_active , rx__data [7 : 0] ) ; 

// monitor for transmit activity: 

// REPLACES the previous $monitor. 

$monitor ( 

"Time = %0d: tx_valid = %b, tx_data = $h" , 

$time, tx_valid, tx data [7:0]); 



end 
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Finally, as with $strobe, there are variants of the $monitor system call as 
well: $monitorh, $monitoro, $monitorb, $fmonitor, $fmonitorh, 

$fmonitoro, and $fmonitorb, with the exact same meanings as 
corresponding with $strobe. 

4.1.20 Summarize the main differences between $strobe and 
$monitor. 

The differences between $strobe and $monitor are summarized in the 
following points: 

• $strobe can be used to create new simulation events, simply by 
encapsulating the $strobe system call within a simulation construct 
that moves simulation time, such as @( posedge clock), @(negedge 
clock), @ (any _signal) etc.There can exist multiple $strobe system 
calls at the same time, with identical or different arguments. 

• $monitor stands alone. A given set of arguments of $monitor form 
their own unique sensitivity list. Only one $monitor call can be 
active at any time. Each call to $monitor replaces any previous 
call(s) to $monitor. 

4.1.21 How can I selectively enable or disable monitoring? 

$monitor can be selectively enabled or disabled by the $monitoron and 
the $monitoroff system calls, respectively. The $monitoron and $monitoroff 
system calls affect only the most recent call to $monitor. 

4.1.22 How can I specify arguments on the Verilog simulator’s 
command line? 

User defined command line arguments to the Verilog simulator are 
usually preceded by a “+”, and are generally referred to as “plusargs”. For 
example, a Verilog command line may look like this: 

<my_Verilog_simulator> +MYGPA=4.0 +MYSCHOOL=geek_factory 

Here, the plusargs are MYGPA and MYSCHOOL. The values assigned 
to these plusargs are “4.0” for MYGPA and “geek_f actory” for 
MYSCHOOL. 



Verilog defines system- tasks for determining 
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• which plusargs are defined : $test$plusarg 

• what is the value assigned to each plusarg: $value$plusarg 

Continuing with the above example: 

$test$plusarg ( "MYSCHOOL" ) 

// would return a non-zero integer, 

$test$plusarg ( "MYGPA" ) 

// would also return a non-zero integer, whereas 

$test$plusarg ( "MYSPECIALIZATION" ) 

// would return an integer zero. 

Therefore, we can use $test$plusargs in a Verilog testbench to query if 
particular plusargs were defined on the command line. 

After knowing which plusargs were defined on the command line, the 
value assigned to each plusarg can be queried as well, using the 
$value$plusarg system task. 

Again, continuing with the previous example: 

real gpa_value ; 

$value$plusarg ( "MYGPA=%f " , gpa_value) ; 
would result in 
gpa_value = 4.0 

Similarly, the following snippet of code: 
reg [ 8 * 80 : 1 ] name_of_school ; 

$value$plusarg ( "MYSCHOOL=%s" , name_of_school ) ; 
would result in 

name_of_school = geek_factory; 



In other words, $value$plusargs is analogous to the sprintfQ function in C. 
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4.1.23 Can the ' define be used for text substitution through 
variable instead of literal substitution only? 

Typically the 'define text macro has been used for literal text 
substitution only. For example, 

'define width 8 

// substitutes 'width with 8 
wire [ "width- 1 : 0] wirel; 

In the above example, 'width is literally replaced by 8 wherever it is 
used. This was the usage syntax in Verilog-1995. However, from Verilog- 
2001 onwards, the text substitution can also work, taking in variables, and 
still do text substitution as required. For example, 

"define pos_clk(inl) @(posedge ini) 

module test_def ine text ; 

wire elk; 

initial 

"pos_clk (elk) ; 

endmodule // test_def ine_text 

In the above example, wherever the 'pos_clk will be called, it will be 
substituted by @ (po sedge elk) , with the elk being the argument 
passed to the text macro. 

Note : During text substitution, it is important to pay attention to the white 
spaces, too. For example, in the define in above example, if a white space 
exists between pos_clk and (ini), the replacement wouldn’t work. 

SUMMARY 

This chapter discussed miscellaneous topics that couldn’t be allocated 
into the rest of the chapters. These topics are spread across the different 
sections of the Verilog language. 




Chapter 5 

COMMON MISTAKES 



COMMON VERILOG CODING MISTAKES 

This chapter describes different errors that aren’t detected during the 
compile time, but show up either as functional problems or as run-time 
problems during simulations. The list presented is by not exhaustive, but it 
captures most of the common mistakes seen during the development phase. 
Each of these mistakes is illustrated with an example, and possible 
workarounds to avoid these from occurring. 

5.1 Some common errors that are not detected at 
compile-time 

These are the mistakes that are not detected during the compile time, that 
is, it is a legal syntax of Verilog, but it ends up being either functionally 
incorrect, or causes hang during simulations, due to deadlock or live-lock, 
etc. 

5.1.1 What are some ways a race condition can get created, and 
how can these race conditions be avoided? 

Race condition happens when two variables are being assigned values at 
the same event time. Race conditions also happen due to incorrect coding 
style of using the blocking assignments in clocked processes. The 
destination variables wouldn’t have been scheduled to be updated due to the 
bad coding styles resulting in incorrect values being updated, hence, causing 
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a race between the source and destination variables. The receiving/retrieving 
agent, which could be another variable, or a function, like $display, using 
this variable, would display the value as per its scheduling resolution. In the 
following example, the same variable is initialised in two blocks: 

module race ; 

reg a ; 

wire b; 

initial begin // start at time 0 
a = 1; 

end 

initial begin // start at time 0 
a = 0 ; 

end 

assign b = a; 

initial begin // start at time 0 
$display ("a=%0b, b=%0b",a,b); 

end 

endmodule 

The above code displays either a 0 or 1 for the variable “a ", and it 
typically follows the value that was last assigned to it. In general, this is 
heavily dependent on the implementation of the simulator. The value of “b " 
might be shown as X or 0 or 1, depending on the event ordering of the 
simulator. 

One of the recommendations is to avoid driving variables from multiple 
sources. If a variable needs to be a function of information from multiple 
processes, then the assignment to the variable must happen in one place, 
with the control variables on the RHS of the assignment. 

The other recommendation is to assign the variable in-line. Beginning 
with Verilog-2001, there is a convenient way to initialise the variable from 
in-line during the declaration itself. For example, the variable a, above, can 
be initialised in only one place as: 
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reg a = 0 ; 

The SystemVerilog standard further enhances the above initialisation 
procedure, such that all the in-line initialisations are guaranteed to happen 
prior to the execution of any events before any simulations begin. See also 
FAQ 5.1.15 for additional side effects related to race condition. 

5.1.2 Illustrate how the infinite loops get created in the looping 
constructs like forever, while and for. 

Infinite loops are one of the common things that cause a hang in a 
simulation. If the loops don’t have a finite end value for the looping variable, 
then the loop never terminates. The following are common examples: 

reg done; 

initial begin // start at time 0 
done = 0 ; 

while (-done) begin 

#5 $display ( "Entered loop at t=%0d" , $time) ; 
end 
end 

Note that the above while loop only exits if some other process changes 
done to be 0. If this never happens, then the loop never terminates, but is 
syntactically correct. 

The fix for the critical loops is to add a check within the for loop for a 
way to disable the loop when the loop variable exceeds a limit. Other way is 
to add a watchdog timer parallel to the for loop in a fork-join. This would 
stop the simulation if the loop doesn’t get terminated in a specific number of 
iterations or a specific amount of time. 

With SystemVerilog, an assertion statement can be used instead of 
having to write a watchdog timer. 

reg [31:0] i ; 

initial begin // start at time 0 

for ( i = 0 ; i >= 0 ; i = i + 1) begin 

#5 $display ( "Entered loop at t=%0d" , $time) ; 
end 
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end 

Note that the above for loop wouldn’t terminate since the termination 
condition of i >= 0 is always met. The fix for this is similar to the 
addition of watchdog timer or use of the assertion feature in SystemVerilog. 

reg elk; 

initial begin // start at time 0 
forever #5 elk = -elk; 

$display ( "After forever at t=%0d" , $time) ; 
end 

Note that the $display statement above is never reached since the forever 
statement never gets completed. This makes the forever statement to be the 
last statement in any procedural block. The statements after the forever loop 
should be added into a different procedural block. 

5.1.3 Illustrate the side-effects of specifying a function without a 
range. 

It is common to use the function to assign a value. A mistake can happen 
if a range for the function return is not specified. If a range is not specified, 
Verilog will assume a 1 bit return value. If a multi-bit return value was 
calculated in the function, only the least significant bit is returned. In the 
following example, the value of the correct result is 7 if the range of [31:0] is 
specified in the function definition. Without a range, however, only the least 
significant bit of the value is returned, which is 1 . 

reg [31:0] result; 

function my_f unc ; // returns only the lsb 

input [31:0] ini; 
input [31:0] in2 ; 
reg tmp_reg ; 
my_func = ini + in2 ; 
endfunction 

initial begin 

result = my_func(3, 4) 

$display ( "result = %0d" , result) ; 
end 
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There would not be any compilation errors for the function definition. It 
is a runtime functional error. To correct the error, the function should be 
declared with a range of sufficient size for the return value, such as: 

function [31:0] my_func; // returns the correct range 

5.1.4 Illustrate how the errors of passing arguments to a function 
in incorrect order is eliminated in SystemVerilog. 

The arguments to & function call have to be exactly in the same order as 
the input arguments defined in the function. Passing arguments in an 
incorrect order to & function call, would result in incorrect functionality, 
although it is syntactically correct. For example, in & function call, two of 
the inputs, say, ini and in2 could result in incorrect functionality if the 
variables that call the function were not passed in the right order. 

SystemVerilog has an enhancement to function's that eliminates this 
ambiguity, by bringing in a feature to both task and function calls, wherein 
the arguments to the function can be passed explicitly by name, rather than 
implicitly by order. In the following code, the arguments of the function call 
are connected to their right source and destinations, although the order in 
which they have been passed are not in the same order the function 
definition has been declared. 

module funct_output (ini, in2, outl, out2, 

out3 , out4) ; 
input [1:0] ini, in2; 
output [1:0] outl, out2, out3, out4 ; 

reg [1:0] outl, out2, out3, out4 ; 

// void, function doesn't return anything 
function void arith; 
input [1:0] ini, in2; 
output [1:0] outl, Out2, out3, OUt4 ; 
begin 

outl = ini & in2 ; 
out2 = ini | in2 ; 
out3 = ini A in2 ; 
out4 = ini % in2 ; 
end 

endfunction 
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always_comb 
begin 
arith ( 



ini 


(ini) , 


// 


outl 


(outl) , 


// 


in2 


( in2 ) , 


// 


out 3 


(out 3) , 


// 


out2 


(out2) , 


// 


out 4 


(out4 ) 


// 



The order of arguments to a 
function call is immaterial . 
Inputs and outputs can be in 
any order as long as they 
are connected to the right 
source or destination 



end 



endmodule 

The result of the above function call, by passing arguments by name, is 
the same as it was with implicit order of arguments to the function call. 
Hence, this eliminates possibilities of in-advertant errors during function 
calls. 



5.1.5 Using tri-state logic inside a chip 

The presence of internal tri-state logic is a critical consideration for 
power sensitive products. Normally a multiplexor should be used in place of 
tri-state logic. However, if the tri-state logic remains in the RTL, it is not an 
error for compilation. Synthesis tools sometimes warn the users. The linting 
tools also detect this condition, and report this to the user. 

wire a, b, c; 



assign b = (a == 1) ? c : l'bz; 

5.1.6 Illustrate the side effects of not having a final else clause in an 
if-else construct. 



In a combinatorial block, not having a final else clause would result in a 
latch when synthesized. This is a fully legal construct in Verilog, and would 
compile without error. For example, 

reg tmp; 



always @ (enable, ini) begin 
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if (enable) 
tmp = ini ; 
end 

In the above code, there is no else clause in the combinatorial block and 
would result in a latch when synthesized. Many synthesis and linting tools 
detect and report this very well. If the latch is not to be inferred, then an else 
block is required. 

If the intent is to produce a latch, any synthesis and lint warnings are 
false. The warnings can be avoided by using the always Jatch keyword in 
SystemVerilog as follows: 

always_latch 
if (enable) 
tmp <= ini; 



5.1.7 What is the side effect of not having a default clause in a case 
construct 

This is another common reason for the cause of un-intentional latches. 
The default case is necessary if all the cases are not fully specified. For 
example, 

module lower (ini, in2 , opcode, outl); 
input [1:0] ini, in2, opcode; 
output [1:0] outl; 

reg [1:0] outl; 

always @(inl or in2 or opcode) begin 
case (opcode) 

2 'b00 : outl = ini & in2 ; 

2'b01 : outl = ini | in2 ; 

2'blO : outl = ini in2 ; 

// 2'bll : outl = ini % in2 ; // uncommenting 

// either of these two 

// default : outl = ini & in2 ; // lines will avoid 

// a latch 

endcase 

end 
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endmodule 

The presence of the default statement will initialise the outl variable 
for all definitions of case items that are not be specified. Good linting tools 
specify the presence of the latches, and the synthesis elaboration log file also 
needs to be checked for the presence of latch inference. 

5.1.8 Illustrate example of how unintentional deadlocked situations 
can happen during simulation. 

When there are interactions between two processes in a handshake 
interface form, it is important to ensure that the implementation doesn’t 
allow itself to become deadlocked. The deadlock situation is one in which 
one process is waiting for the other process to enable it, which in turn will 
enable the source process. The code could be a syntactically correct 
implementation, and still have a deadlock situation. The scenario can happen 
in both synchronous and asynchronous designs. A simple asynchronous 
example has been illustrated in the following, to demonstrate how deadlock 
occurs. 

module deadlock; 
reg a , b , c ; 
initial begin 

a <= 0; // source of deadlock 

b < = 0 ; 

wait (b == 1 1 bl) ; 

$display ("Variable b detected as 1"); 
end 

always @(a) begin 

if (a == l'bl) begin 

$display ("Variable a detected as 1"); 
b = 1; 

$display ("Variable b assigned to 1"); 
end 
end 



endmodule 
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In the above example, the displays of variable a and b being detected 
will never get asserted, since the variable a has been initialised to 0. In the 
above example, if the variable a gets initialised to 1, then all the displays get 
asserted. The above example is an illustration of the deadlock scenario, 
which can be difficult to capture in a larger implementation. 

SystemVerilog eliminates the race condition with always _comb, which 
automatically triggers once at time 0, after all pending time 0 assignments 
are executed. 

5.1.9 Having a programmed loop that does not move simulation- 
time 

When any form of loop executes without any delay in between iterations, 
the code without a defined termination criteria would make the simulator 
hang at that loop. The delay can be either through a ( posedge elk), or # 
delay constructs. The simulation time also doesn’t move ahead, due to this 
issue since there is no advancement of time. For example, in the following, 
the while loop runs in 0 time delay between iterations forever, and would 
cause a hang. 

module while_hang; 

initial begin 

while (1) begin : loop_while 
//No construct to advance the time 
end // loop_while 

$finish; // this line is never reached, and 
// the simulation hangs 

end 

endmodule 

These kind of zero delay loop is only a problem if another process is 
reading the variables assigned in the loop. In the above example, the 
variables assigned in the loop might not have a chance to propagate to the 
DUT. This causes a write-write .. write-read race condition. This scenario 
typically occurs in a testbench. 

One of the common workarounds to detect such unintentional hang 
scenarios is by introducing a check in difference in time between iterations 
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just before the end of the loop. If the time didn’t advance, then the loop 
should be exited. The following is an extension of the above example with 
the time check between the iterations of the loops: 

module test_repeat ; 
integer i; 

time intime, outtime; // time tracking variables 
initial begin 

while (1) begin : loop_while 

intime = $time; // beginning of the loop 
// @ (posedge elk) // time advancer 
II... other statements within the while loop 
outtime = $time; // end of the loop 
if (intime == outtime) begin 

$display ( "Hang detected: intime = %0d, \ 

outtime = %0d" , intime, outtime); 
break; // SV feature 
end 
end 
end 

endmodule 

Having checks like the above in loops that are suspicious of being hung 
will help in easy debugging. 

5.1.10 Illustrate the side effect of leaving an input port unconnected 
that influences a logic to an output port 

Leaving an input pin floating will cause a 'bz to be propagated during 
functional simulation. During synthesis, it will cause the gate optimiser to 
optimise away the logic that propagates beyond a floating input. For 
example, 

module lower (ini, in2 , outl, out2); 

input [7:0] ini, in2 ; 
output [7:0] outl, out 2 ; 



assign outl = ini & in2 ; 
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assign out2 = ini | in2; 
endmodule // lower 

module upperl (u_inl, u_in2, u_outl, u_out2) ; 
input [7:0] u_inl, u_in2; 
output [7:0] u_OUtl, U_OUt2 ; 

reg [7:0] regl; 
wire [7:0] wirel; 

// Instantiating lower with ini floating 
lower U1 ( 

.ini (), // input is left floating 

. in2 (u_in2 ) , 

.outl (u_outl) , 

.out 2 (u_out2) 

) ; 

endmodule // upperl 

The above logic would get synthesized, such that the u_inl is not 
connected to any logic within its hierarchy, and ini is directly connected to 
out2 in the lower hierarchy, (since out2 is an or’ing function of ini 
with ‘nothing’). 

5.1.11 Illustrate the side effect of not connecting all the ports during 
instantiation 

Unconnected input ports evaluate to a ‘ z ’ . If the input port is used in if 
conditions with logical equality operator (==), then the condition evaluates 
to a logical false. For example, 

module modi (ini, en) ; 
input ini ; 
input en; 

always @(inl, en) begin 
#5 if (en == l'bl) 

$display ("Reached then"); 
else 

$display ("Reached else"); 
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end 

endmodule // modi 
module modl_top; 
reg ini ; 

initial ini = l'bl; 

// Instantiating modi module and port en is 
// left floating 

modi UO ( 

. ini ( ini ) , 

. en () 



endmodule // modl_top 

The above example will display “Reached else”, since there was nothing 
connected to port en. Most of the simulators issue a warning message if 
input ports are unconnected or left floating. 

The above mis-connections can be detected through a monitor module 
that would be peeking at all the inputs and outputs of the DUT. An example 
snippet of code to detect the floating inputs and outputs is: 

module modl_mon ( ini , en) ; 
input ini, en; 

initial begin 

if (ini === l'bz) 

$display ( " ini is seen floating at t=%0d", 

$time) ; 

if (en === l'bz) 

$display ( "en is seen floating at t=%0d" , $time) ; 

end 



endmodule // modi mon 
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Note that the check for floating input is done once through an initial 
block, since the state of the input is not expected to change dynamically. If 
the scenario of intentionally floating the input is necessary, then the above 
code needs to be placed in an always block. The above monitor module can 
be instantiated within the modl_top module as: 

modl_mon U2 ( 

. ini (ini) , 

. en (en) 

) ; // modl_mon 

With the en input left floating in the testbench, the display outputs of the 
above are: 

en is seen floating at t=0 
Reached then 

5.1.12 Illustrate the side effect of forgetting to increase the width of 
state registers as more states get added in a state machine. 

Normally the width of the state register is the closest power of 2, that is, 
for a 5 state state-machine, the state register would be [2 : 0] or 3 bits wide. 
The state variables would be from 3’b000 to 3’bl 1 1. 

As the number of states in the state machine increase and go past 3’bl 11, 
the width of the state variable also needs to be increased to [3:0], etc. 
Suppose the additional states were having values like 4’bl000, 4’bl001, etc., 
and the width of the state register remained at [2:0]. This would erroneously 
truncate to 3’b001 for the state value of 4’bl001, and to 3’bl01 for the state 
value of 4’bll01. 

It is syntactically correct to have a smaller width of the state variable 
register and larger values of the state variables, and would not cause any 
error during compilation. But, this would lead to functionally incorrect 
results. Some of the good linting tools would catch this type of problem. 

SystemVerilog has a new feature of enumerated state variables that will 
help in resolving this issue. The keyword enum is used for this purpose, 
which both assigns and increments the new variables added into this. The 
following example illustrates the use of this feature: 

module enumfm (elk, reset_n, rd_n, ready, done, 
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OUtl) ; 

input elk, reset_n, rd_n, ready, done; 
output out 1 ; 

enum {idle, read, write, wait4rdy} current_state , 
next_state ; 

always_ff @ (posedge elk or negedge reset_n) 
if (reset_n == 1'bO) 
current_state <= idle; 
else 

current_state <= next_state; 

always_comb 

begin 

next_state = current__st.ate ; 
case (current_state) 
idle: if (~rd_n) 

next_state = read; 
else 

next_state = write; 
read: if (! ready) 

next_state = wait4rdy; 
else if (done) 

next_state = idle; 
write: if (! ready) 

next_state = wait4rdy; 
else if (done) 

next_state = idle; 
wait4rdy: if (ready & ~rd_n) 
next_state = read; 
else if (ready & rd_n) 
next_state = write; 
default: next_state = idle; 
endcase 
end 

assign outl = ( ( (current_state == read) | | 

(current__state == write) ) 

&& ready) ; 
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endmodule // enumfm 

In the above example, the state variables current_state and 
next_state are declared through enum. Without any assignments, the 
simulator and synthesis tools will assign the values linearly, incrementing 
with value of idle=0, read=l, write=2, wait4rdy=3, etc. As more states get 
added to this, the values simply increment. 

5.1.13 Illustrate the side effect of an implicit 1 bit wire declaration 
of a multi-bit port during instantiation. 

This is a common problem seen during connecting blocks with multi-bit 
port sixes. Since Verilog has the feature to define implicit 1 bit wires during 
port connections, the multi-bit port will be connected to single bit wires. It is 
a WARNING and not ERROR during compilation for most of the 
simulators. If the WARNING messages are turned off, or if there are too 
many of these WARNING messages, this issue can go undetected at the 
simulator level, and become an error during functional simulation. For 
example, 

module lower (ini, in2, in3, outl, out2) ; 

input [7:0] ini, in2, in3 ; 
output [7:0] outl, out 2 ; 

assign outl = ini & in2 ; 
assign out2 = ini | in2; 

endmodule / / lower 

module upperl (u_inl, u_in2, u_outll, u_outl2, 
u_out 2 1 , u_out 2 2 ) ; 
input [7:0] u_inl, u_in2 ; 
output [7:0] u_OUtll, u_outl2 ; 
output [7:0] u_OUt21, u_OUt22 ; 

reg [7:0] regl; 
wire [7:0] wirel; 

// Instantiating lower 
lower U1 ( 

. ini (u ini) , 
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. in2 (u_in2 ) , 

. in3 (in3) , // causes an implicit 1 bit wire 

//.in3 (wirel) , // genuine 8 bit wire 

.outl (u_outll) , 

. out2 (u_outl2) 

) ; // lower 

// Instantiating lower 
lower U2 ( 

.ini (u_inl) , 

. in2 (u_in2 ) , 

. in3 (in3) , // causes an implicit 1 bit wire 

//.in3 (wirel), // genuine 8 bit wire 

.outl (u_out21) , 

.out2 (u_out22) 

) ; 

endmodule // upper 1 

This issue is detected as an ERROR by the synthesis tools. Also, some of 
the editors, like EMACS, can be commanded to declare the intermediate 
wires of the correct width. 

SystemVerilog provides enhancements that can prevent this implicit 1-bit 
wire error. The implicit named port connection during module instantiations 
will not permit connections where the net is a different size than the port. 

5.1.14 Same variable used in two loops running simultaneously 

Sometimes accidentally, a loop variable is used in two different blocks 
(often the for loops), and would be modified in both places. Although this is 
syntactically correct, it would cause functional problems. For example, in 
the following code, the same variable, “i”, is being modified in two 
different loops, which could be difficult to detect in large pieces of code: 

module twoloops; 

integer i; 

initial begin / / start at time 0 
for (i=0; i <= 7; i = i + 1) begin 

#5 $display ( "Entered 1st loop at t=%0d, i = 
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end 

end 



%0d" , $time , i) ; 



initial begin / / start at time 0 

for ( i = 0 ; i <= 7; i = i + 1) begin 

#2 $display ( "Entered 2nd loop at t=%0d, i = 
%0d" , $time, i) ; 

end 

end 



endmodule // twoloops 

The output of the above code produces displays as: 

Entered 2nd loop at t=2, i = 0 
Entered 2nd loop at t=4, i = 1 
Entered 1st loop at t=5, i = 2 
Entered 2nd loop at t=6, i = 3 
Entered 2nd loop at t=8, i = 4 
Entered 1st loop at t=10, i = 5 
Entered 2nd loop at t=10, i = 6 
Entered 2nd loop at t=12, i = 7 
Entered 1st loop at t=1 5, i = 8 

Note that the iteration from 0-8 is shared between the two loops. In a few 
rare occasions, this could be a genuine requirement in behavioural coding, in 
which case the user needs to verify that the intended functionality is 
correctly met. 

However, SystemVerilog has a good feature of declaring the variable 
within the for loop so that the variable is local to that loop only. The same 
for loop in the above example, in SystemVerilog would be: 

module twoloops; 

// integer i; // is now local within the for loops 

initial begin // start at time 0 

for (int i=0 ; i <= 7; i = i + 1) begin 

#5 $display ( "Entered 1st loop at t=%0d, i 
%0d" , $time, i) 
end 
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end 

initial begin // start at time 0 

for (int i=0; i <= 7; i = i + 1) begin 

#2 $display ( "Entered 2nd loop at t=%0d, i 
%0d" , $time , i) ; 

end 

end 



endmodule // twoloops 

The output of the above SystemVerilog code would produce all the 8 
iterations from both loops as follows: 

Entered 2nd loop at t=2, i = 0 
Entered 2nd loop at t=4, i = 1 
Entered 1st loop at t=5, i = 0 
Entered 2nd loop at t=6, i = 2 
Entered 2nd loop at t=8, i = 3 
Entered 1st loop at t=10, i = 1 
Entered 2nd loop at t=10, i = 4 
Entered 2nd loop at t=12, i = 5 
Entered 2nd loop at t=14, i = 6 
Entered 1st loop at t=1 5, i = 2 
Entered 2nd loop at t=1 6, i = 7 
Entered 1st loop at t=20, i = 3 
Entered 1st loop at t=25, i = 4 
Entered 1st loop at t=30, i = 5 
Entered 1st loop at t=35, i = 6 
Entered 1st loop at t=40, i = 7 

Some of the considerations in using a variable declaration within the for 
loop are: 

• The local declarations of the variables within the for loop cause the 
variable to have automatic properties, that is, will not be overwritten 
when used in multiple loops, as illustrated in above example. 

• The loop variable is visible only within the for loop, and not outside 
it. If the variable needs to be accessed outside the for loop, it must be 
declared explicitly outside the loop 
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5.1.15 Illustrate the side effects of multiple processes writing to the 
same variable. 

When the same variable is assigned in two different processes, it not only 
creates race conditions during simultaneous assignments, it becomes non- 
synthesizable. Most of the simulators allow compilation to proceed, since it 
is syntactically correct. For example, 

module blknonblk (elk, ini, in2 , in3 , outl); 
input elk, ini, in2 , in3 ; 
output outl; 

reg regl, reg2 , reg3, outl; 

always @(posedge elk) begin : procl 
regl <= ini; 
outl <= regl & reg2 ; 
end 

always @(inl or in2) begin : proc2 
regl = in3 ; 

// regl is assigned in procl and proc2 . Incorrect. 

reg2 = in2 ; 
end 

endmodule 

Most of the linting tools are able to detect this, and is also a compilation 
error seen during synthesis. 

5.1.16 Illustrate the side effect of specifying delays in assignment’s. 

Specifying any kind of delay before an assignment, or within an 
assignment, in a blocking or nonblocking procedural assignment is ignored 
by synthesis tools. If the functionality depends upon the presence of the 
delay, then a mismatch in functional simulation will be seen between the 
model and the synthesized netlist. For example, 

regl = #3 reg2; // #3 will be ignored 

#6 reg3 <= reg4 ; // #6 will be ignored 
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Since the above construct is syntactically legal, the synthesis tools will 
issue a WARNING and not an ERROR. 

SUMMARY 

This chapter discussed the common functional mistakes that happen 
during the coding using Verilog. Most of these errors go undetected, as they 
will be syntactically correct, and, hence, get past the compilation. While 
many of these are un-intentional errors, a preview of these scenarios will 
help the readers towards debugging more easily in the different stages of the 
project cycle. Any workarounds that could help in avoiding these mistakes 
have also been discussed. 




Chapter 6 

VERILOG DURING SIMULATION 
REGRESSIONS 



INTRODUCTION 

This chapter discusses using Verilog for regression, testing In particular, 
we discuss the requirements for pre-release regression testing, and the issues 
encountered during such regression simulations. As the design sizes continue 
to grow, so does the complexity of verification and the regression runtime. 
We discuss some special constructs of Verilog that help in meeting some of 
the needs of pre-release regression simulations. In most cases, Verilog alone 
is not sufficient in constructing the regression infrastructure. Regression 
environments are typically wrapped in programming languages like C, and 
scripting languages like Perl, TCL, make or csh. Scripting languages 
typically constitute the control/logistical flow of the regression environment. 
This chapter discusses specifically how these logistics can be aided in 
implementing the release infrastructure efficiently, and how the constructs in 
Verilog help in achieving the same. 

The development of a good regression environment is not something that 
can be deferred until the end of code development and testing. It is an 
essential infrastructure that needs to be built right at the architectural 
definition phase. This way, the changes done during the development phase 
can also make use of this infrastructure to validate the changes. 
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6.1.1 Illustrate a few important considerations on simulation 

regressions, and how Verilog can be useful for achieving the 
same. 

While the regression environment is quite unique to each product, the 
following are a few generic requirements that are useful for a release 
simulation: 

1. The user must have provision to turn off the waveform dumps during 
the regression. This will not only save disk space, but also improve 
the overall runtime. In Verilog, this can be controlled by a variety of 
ways, as illustrated in the following: 

a. Controlling the dump operation via a command line 
argument. This can be implemented through the ifdef 
compiler directive and the $dumpvars system task. An 
example to illustrate the same is: 

'ifdef DUMP_ON 

initial $dumpfile ( "waves . dump" ) ; 
initial $dumpvars; 

' endi f 

During the command invocation, the following needs to be 
appended at the end: 

+def ine+DUMP_ON 

b. In the above method, either the waveforms of all the 
variables from/below the hierarchy from where the 
$dumpvars command is invoked are recorded, or there is no 
waveform recording at all. If the waveforms need to be 
dumped at specific hierarchical levels, this can be controlled 
by specifically mentioning the hierarchy in the first argument 
of the $dumpvars command. For example, 

$dumpvars(0, testtop. Ul); 

This specifically dumps all hierarchies of U1 at and under 
testtop only, and no other modules. 



$dumpvars(l, testtop); 
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This specifically dumps all hierarchies at testtop only, 
that is, level 1, and none below level 1. 

Note that more than one module can be mentioned after the 
first argument, for specifying additional hierarchies to be 
dumped. 

The value of the depth can also be passed from the command 
line argument, using the Verilog $value$plusargs construct. 
An example of how to use this is explained later this chapter. 

c. In the above method, the dumping of the waveform happens 
right from time 0 onwards, until the end of simulation. 
Sometimes it is necessary to capture the waveform only for a 
small window of simulation. This could be the duration in 
the zone of interest in the entire simulation. Verilog provides 
a mechanism to capture a specific window, too, using the 
$dumpon and $dumpoff commands, as illustrated in the 
following example: 

module test_dumping; 

reg elk; 

initial begin 
clk=0 ; 

$dumpvars ; 

$dumpoff; 

#5 0; 

$di splay ( "Dumping ON at t=%0d" , $time) ; 

$dumpon; 

# 100 ; 

$display ( "Dumping OFF at t=%0d" , $time) ; 

$dumpof f ; 

#75; 

$display ( "Simulation actually ends at 
t=%0d" , $time) ; 

$f inish; 
end 

initial begin 

forever elk = #5 -elk; 
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end 

endmodule // test_dumping 

If the dump file is viewed through a waveform viewer, it will 
be evident that there was no dumping until time unit 50, and 
after 150, until the end of simulation. 

This approach is useful for one other purpose, too, as 
follows: 

• In long simulation runs, the user is interested to see the 
dump only for a few transactions/scenarios before the 
erroneous time stamp. During the next iteration of 
debugging, it is useful to specify the $dumpon at a 
timestamp a few appropriate transactions before the 
timestamp of the bug/error scenario in simulation. The 
bug/error timestamp will be known during the previous 
iteration. This will help in loading the dump files faster 
in the waveform viewers, rather than the large dump 
files. 

• This approach also helps in creating smaller dump files 
for debugging and, hence, lessen disk space. 

• Many times, in order to update the product 
specifications, that is, either the functional specification 
or the user documentation, it is necessary to add timing 
diagrams. In that case, the transactions can be run on the 
DUT, and the VCD dump can be captured for the zone 
of window, depicting the transaction scenario with the 
signals of interest. This can capture the waveform for 
that specific simulation time, and be useful for the 
waveform capturing into the product specifications. 

d. During regressions, it is necessary to store the dump files of 
the runs optionally. For example, if a test PASSED in a self- 
checking testbench, it is very unlikely that the dump file is 
further required for debugging purposes, other than viewing 
the waveform for an explicit check. In that case, it is not 
required to have the dump file of a passing test, which would 
unnecessarily occupy large disk space. In such a case, the 
dump file can then be conditionally deleted, or retained if the 
test did not pass. This feature can be made configurable 
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through specially implemented configuration commands, or 
by passing specific arguments from command line. 

2. The provision to generate or not generate the log files (transcripts of 
the simulation run) should be controllable at the command invocation 
level, or by specifying this in as a specialized configuration 
command, or as a parameter in an include before the simulations 
begin. Just like the waveform dump files scenario explained above, 
the log files sometimes take a lot of disk space and runtime. These 
factors will not be significant if one or two tests are run. But these 
add up when there are several thousand tests to be run. The following 
are a few ways to control the dumping of the log file: 

a. For command invocation, the mechanism is exactly the same 
as described in the above description for the dumping of 
waveform. This mechanism is usually the most convenient. 

b. For non-command invocation control, the facility to dump or 
not dump could be controlled by a user defined configuration 
command conf igure, which is basically a Verilog task for 
the user. For example, 

configure (log_f ile_dumping, true) ; 

This can be used within the messaging commands to use 
$display or $f display, depending upon whether the 
log_f ile_dumping is false or true. For example, 

task display_msg( ...) 

if (log_f ile_dumping) begin 

$f display ( int_log_file , <string>) ; 
end else begin 

$display ( "<string>" ) ; 
end 

endtask 

3. The provision to add/modify/delete the tests must be possible to be 
done easily by the user, and preferably modifying just one file. 
Typically, during the development phase, the regressions can fail on 
a particular tests. In that case, the test infrastructure should have the 
facility to run just a single test case, or a subset of testcases, with 
ease, typically by just modifying the list of testcases from a file. The 




220 



Common Mistakes 



other way could be to specify all the subset of tests to be run in the 
command line argument itself. While the latter is okay for a smaller 
number of tests, with small names of the tests, it may not work in 
many shells with limited capacity of the characters on the command 
line. When the file approach is used for a group of tests, a special file 
parser is required for this purpose, and invokes the tests mentioned in 
each line. 

An example of the file parser approach is discussed in FAQ 6.1.2 of 
this book. 

4. If multiple CPUs are present, then the provision to schedule these 
optimally for the regression puiposes should be used. This is not a 
Verilog feature, but a useful functionality done by batch-scheduling 
software. This way, the sessions get queued on multiple CPUs and 
each CPU is used optimally, to complete a given sub-task of the full 
run. Batch-scheduling software is also useful when the number of 
licenses of the tools is limited, and requires the jobs to be sequential, 
based on availability of the licenses. 

5. The provision to display the results of the full regression must be 
available to the user at any time during the regression. This should 
not be considered a post-processing task at the end of a long 
simulation. This will help in providing the user feedback early on if 
there is a problem in the regression. This will help decide to 
terminate the regression, if it is not worthwhile to proceed further. 
The resolution of the results should be at a single test level, and can 
be until the last test completed. Some of the key features to be 
displayed are: 

a. Hostname of the machine where the test was run 

b. Number of tests passed 

c. Number of tests failed 

d. Number of tests timed out 

e. Seed value of the test run (in case of random testing) 

f. Date/time test started 

g. Date/time test ends 

h. Total number of tests run 

i. List the tests passed/failed/timeout into separate files 

j. If Failed, what was the string of failure, that is, data 
mismatch? 
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k. Which was the source/destination that encountered the 
failure, that is, whether the DUT was involved, or was the 
error between agents involved in a traffic test without DUT? 

l. Runtime taken for each test and an accumulated summary 

m. If' memory is critical, then the memory required for the tests 

The mechanism to display the summary can be done through a Perl 
script or csh script, or even through Verilog, through the file I/O 
capabilities. The file parsing/interpreting mechanism discussed in the 
earlier sections can be useful for this. As an extension, it would be 
useful to display the result in a HTML format, which will be useful 
for all interested team members for viewing the results through a web 
browser. 

6. Sometimes it is useful to pass the name of the test as an argument to 
the simulation session. This will be useful to print the test name 
during some print messages, or to create the log file based on the 
name of the test. The Verilog $value$plusargs command line inputs 
can be used for this puipose. This system function searches the 
command line argument for certain patterns, and assigns the value to 
the pattern into a variable within the testbench. Note that you can 
give multiple such command line inputs, and the unique string will 
assign the destination variable. For example, 

module tmp_task ; 

reg [8*31 : 0] test_name; 

initial begin 

if ($value$plusargs ( "test_name=%s" , test_name) ) 
begin 

$display( "Starting test %0s at %0d" , test_name , 

$time) ; 

end 

end 

endmodule 

When the above module is simulated with the command line input of: 

% <simulator_executable> +test_name=mytest 



the output produced is: 
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Starting test mytest at 0 

Similar to the above, multiple such arguments can be communicated 
into the testbench environment. The command line arguments can be 
passed from a wrapper script that launches these tests. 

Note that, if the sufficient number of $value$plusargs are not defined 
in the testbench, the excess arguments will be ignored. That is, if the 
testbench code has implemented checks for 4 plusargs, and additional 
plusargs are mentioned in the command line, they get ignored by the 
testbench. 

7. All the inputs to the regression should be checked before the launch 
of the long regression. These inputs could be any or all of the 
following variables: 

• Parameter values for the regression have to be specified. These 
values can be specified through $value$plusargs, as explained 
earlier, or through the parameter files. If a specific parameter is 
not specified, then a default value should apply. 

• Presence of sufficient number of tool licenses to launch single or 
parallel runs. This check needs to be done by the launching 
script, whether it is in PERL or TCL or csh. 

• Checking availability of system memory for the regression. This 
check needs to be done by the launching script, whether it is in 
PERL or TCL or csh. 

• Checking availability of disk space, considering the outputs of 
the log and dump files that gets produced during the regression. 
This check needs to be done by the launching script, whether it is 
in PERL or TCL or csh. 

8. Since the product could be implemented in multiple platforms too, it 
is necessary to involve the scripts for multiple platforms, like Solaris, 
Linux, HP-UX, etc. As a first order requirement, simulation scripts 
should be platform-independent to the extent possible. If platform- 
specific constructs are used in scripts, then these should be multiply 
customized for all the supported platforms. The script should be 
intelligent enough to detect the platform on which it is being run. 
Sometimes, the version number of the Operating System (OS) also 
matters, as also the necessity for certain patches. The script should 
automatically detect the platform and the version number, and do 
everything appropriate to the particular platform of execution. 
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9. If the design has multiple parameters, it is required to run the 
regression across multiple parameters, which influence the 
functionality of the DUT dramatically. Examples of such parameters 
include varying bus widths, varying clock frequency ratios between 
externally accessible clock domains, endian of the data path, widths 
and depths of FIFOs used, etc. The scripts should have the capability 
to cycle through all legal combinations of parameters, and run 
simulations for each combination. 

10. There should be facility within the regression to switch between 
automatic command generation, using random stimulus generation, 
and executing specific sequences of commands, using a directed 
flow. This should preferably be controlled by a single flag, through a 
specialized user defined task like configure. For example, 

configure (auto_command_generation, 'true) ; 

This will cause the automatic command generation through random 
command stimulus generation until all its constraints are met. The 
flag can be set to 'false by default. The objective of this 
mechanism is that there should eventually be only one testbench that 
switches between the directed and random command generation with 
ease. The user shouldn’t have to maintain two testbenches, that is, 
one for random stimulus generation only, and one for directed 
stimulus generation flow. 

11. During regressions when all other messages except ERROR/FATAF 
are disabled during the runtime, it is useful to get some “heartbeat” 
message, to know that the simulation is still in progress, and not 
stuck at some point. This needn’t be for every transaction, but it is 
found to be good enough if a heartbeat is seen, for example, every 10 
transactions. At such instants, it would be useful to optionally display 
a message like: 

Simulation in progress after 1 0 transactions att=100 
Simulation in progress after 20 transactions at t=200 

A simple logic to implement the above is illustrated below: 



module heart beat; 
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parameter max_timeout_limit = 500; 

reg elk, start_addr_ph; 
integer num_xns, timeout_track; 

initial begin 
elk = 0 ; 
num_xns = 0 ; 

start_addr_ph = 0; // testing purpose only 
forever elk = #5 -elk; 
end 

initial begin 
$dumpvars ; 

// #1000 $finish; 

end 

// this always block detects addr phase every 
// alternate clock. This is for illustration only. 

// Actual DUT may give addr phase at varied clock 
// intervals 

always @(posedge elk) begin 

start_addr_ph <= ~start_addr_ph; 
end 

always @ (posedge elk) begin 
if (start_addr_ph) begin 
num_xns < = num_xns + 1 ; 
timeout_track <= 0; 

if ( ( (num_xns % 10) == 0) & (num_xns !== 0) ) 
$display ("Simulation in progress after %0d 
transactions" ,num_xns) ; 
end else begin 

timeout_track <= timeout_track + 1; 
if (timeout_track == max_timeout_limit) begin 

$display ("WARNING : Unusually long time of %0d\ 
clocks taken after start_addr_ph. \ 
Maybe the test hung. TIMEOUT" ,\ 
max_timeout_limit) ; 

$display ("INFO : Executing $finish in the file\ 
. /testbench/test_top . v" ) ; 
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$display ("INFO : Change the max_timeout_limit \ 
parameter in this file to increase \ 
the limit"); 

end 

end 

end 

endmodule // heart_beat 

A few salient points regarding the above example are: 

• The value of timeout can be changed through the parameter to 
the appropriate acceptable limit. 

• In the above example, the heartbeat rate is 10 transactions. It can 
also be easily changed to 20, or 50, or to any preferred value. 

12. A constant monitoring by the various bus monitors in the testbench 
should be incorporated. The monitors can communicate to the 
testbench by assertion of an output port, or by setting of a flag in the 
testbench, or by the testbench having access to these flags within the 
monitors. In the following example, two different monitors, with 
instance names U1 and U2, are monitoring the bus activity in the 
testbench with their output ports status: 

wire [1:0] statusl, status2 ; 
modl_mon U1 ( 

. ini (ini) , 

.en (en) , 

.status (statusl) 

); 



modl_mon U2 ( 

. ini (ini) , 

.en (en) , 

.status (status2) 



assign error_det = ((statusl == 2'bll) j 

(status2 == 2'bll)); 
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The outputs of the multiple monitors are OR'ed to assert a critical 
error_det signal in the testbench. This can cause the simulation to 
terminate with a $finish , as illustrated in this simple example: 

always ©(posedge elk) 
begin 

if (error_det) begin 

$display ( "Error detected in testbench at 
t=%0d" ) ; 

if ( stop_on_error ) begin 

$display ( "Terminating regression") ; 

$display ( "Test FAILED"); 

$f inish; 

end 

end 

end 

13. The error messages should be identifiable, whether it is caused by an 
intentional error, or not. Many times, post processing of the log files 
is done, to search/grep for strings like ERROR. In order to 
distinguish between an intentional error and a real error, there should 
be suitable string displayed out before the launch of the intentional 
error. This will help in isolating any false alarm during the post 
processing of these error messages. The assertion of the global error 
flags or signals, as explained earlier, can be gated with the 
unintentional error criteria. 

14. It is useful to inform the user, through INFO messages, as to where 
these values can be changed, and which module caused the exit of 
simulation. When the regressions are run during the development 
phase, it is likely that some tests will hang. There should be some 
kind of logic to detect this hang situation, and have a graceful exit in 
the form of a TIMEOUT. This has been illustrated in the example 
above. 

15. If more than one copy of licenses of the simulation tool is present, it 
is useful to plan the regression, such that the execution happens in 
different directories. Typically, the launch of the simulation happens 
from a common directory like the .../sim (or its equivalent directory 
name in your project). Trying to run multiple runs from the same 
directory could cause dump, log, or any other output files to be 
overwritten. In order to avoid such potentials of overwriting to 
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happen, it is good to design the execution of the runs from different 
directories. This will help in multiples simulation runs to take place 
in parallel. 

6.1.2 What coding constructs ofVerilog can be used during the 
various stages of designing a regression environment for 
simulations? 

Verilog has the following constructs built-in, which help in the 
regressions: 

1. $readmemh/$readmemb : These constructs help in loading memory 
data from a file. These constructs will be useful during 
microprocessor simulations, or in supplying vectors for a DUT, or 
simply customized commands encoded into the various fields of the 
line. Examples to illustrate the above have been discussed earlier in 
an earlier FAQ 3.5.1 in the Verification chapter. 

2. Sometimes, a file parsing is required, to know the arguments from 
certain lines of a file. Instead of writing a PLI just for this purpose, 
the Verilog language provides the $fscanf system task inbuilt. This 
function scans the lines that it reads sequentially, until a carriage 
return is obtained, and assigns the values of the arguments to the 
destination variables. Each argument is typically separated by a white 
space within the input file. This is a very easy way to specify the tests 
to be run within a file, along with its associated arguments. 

For example, in the following code, the infile.txt is a file 
containing three fields. The argl is a decimal variable, arg2 is 
string, and arg3 is hexadecimal. The $fscanf task scans this file, all 
the way until the EOF is reached, and currently displays what it sees. 
It can be modified further into the different simulation launching 
sessions. 

module tmp_task ; 

reg [8*31 : 0] arg2 ; 

// arg2 can be string variable in SystemVerilog 

reg [7:0] argl, arg3 ; 

integer infile; 



initial begin 
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infile = $fopen ( "inf ile . txt" , "r"); 

while ($fscanf (infile, "%d %s %h" , argl, arg2 , 
arg3 ) ) 

begin 

$display ( "argl = %0d,\targ2 = %0s,\targ3 = 
%Oh",argl, arg2,arg3); 

end 

$f close (inf ile) ; 

end // initial 
endmodule // module 

Suppose the infile.txt file contained the following: 

10 testl 0a 

1 1 test2 Ob 

1 2 test3 0c 

1 3 test4 04 

The following would be the output of the above code: 

argl = 10, arg2 = testl, arg3 = a 

argl = 11, arg2 = test2, arg3 = b 

argl = 12, arg2 = test3, arg3 = c 

argl = 13, arg2 = test4, arg3 = d 

In the same way as illustrated above, the argl ,2,3 variables can 
actually be used internally for test launching and initialisation purposes. 

3. Some environments use a “reference-file” based approach for the 
vector comparison for PASS/FAIL criteria of a release regression. 
This is typically useful for a multi-simulator product regression. This 
method is useful, since it is HDL independent output for comparison 
of the responses across the simulators of Verilog. The $fstrobe 
command can be used for this purpose. An example to illustrate this 
is as follows: 

module test f strobe; 



parameter cycle_time = 10; 
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parameter clk2q_time = 9; 
integer out_vec_file; 
reg elk; 
wire outl; 
wire out 2 ; 

wire [3:0] out3, out4 ; 
reg [3:0] count; 
reg capture_vec ; 

initial begin 

out_vec_file = $fopen ( "out_vec_f ile" , "w") ; 
forever # (cycle_time/2) elk = -elk; 
end 

always begin 

if (capture_vec) begin 
# clk2q_time; 

$fstrobe (out_vec_f ile, "%0t %b %b %0h %0h" , 
outl, out2, out3, out4 ) ; 

end 

@(posedge elk); 
end 

initial begin 

capture_vec = 1 ; 
count = 0 ; 
elk = 0; 

#50 capture_vec = 0; 

$f inish; 
end 

always @(posedge elk) begin 
count <= count + 1; 
end 

assign outl = count [0] ; 
assign out2 = count [2] ; 
assign out3 = count; 
assign out4 = ~out3 ; 

endmodule // test_f strobe 



$time , 
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The output of the above code produces the file out_vec_f ile, 
with contents as: 

14 1 0 1 e 
24 0 0 2 d 
34 1 0 3 c 
44 0 1 4 b 

The disadvantage of the vector-based approach is the fact that the 
vectors do not carry information of functional correctness. It is useful 
if the vectors have been inspected a priori by some other method, and 
declared to be “golden”. Another disadvantage of this method is the 
following: If a design is changed during the course of its life-cycle 
(as for example, for bug-fixes or enhancements), then, depending on 
the nature of the changes, the originally captured golden vectors may 
no longer be valid. They would need to be re-captured, re-inspected 
and re-certified to be “golden”. The vector doesn’t carry information 
of functional correctness, except for clock cycle accurate 
reproduction of response, provided it has been verified once before. 

4. When the same testbench is being used for multiple configurations of 
the DUT, that is, for running RTL simulations. Gate level 
simulations, or the behavioural model simulation, the instantiation of 
the DUT must be easily selectable from the command line itself. This 
can be achieved by the use of the command line argument of 
$value$plusarg. An example to illustrate this is as follows: 

'ifdef RTL 

initial $display (" Instantiating RTL\n"); 

// Instantiate the top level of RTL module 

'endif 

'ifdef GATE 

initial $display (" Instantiating Gate level 
netlist\n" ) ; 

// Instantiate top level of Gate level netlist 
'endif 

'ifdef BEHAV 

initial $display (" Instantiating Behavioral 
level\n" ) ; 

// Instantiate top level of Behavioral level 
'endif 
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During the simulation invocation command line, the following can be 
appended: 

% simulation invocation> +define+RTL 
% simulation invocation> +define+GATE 

Note that the Gate level simulations could be slow, due to the 
presence of system timing check commands, like $setup or $hold , 
built in within the simulation models of the cells of the technology 
library. If running the gate-level simulation is a requirement, some 
Verilog simulators have switches that will ignore these timing checks, 
and only the functional simulations are run, to make sure the logic is 
okay. The timing checks are now being done more through Static 
Timing Analysis (STA). 

SUMMARY 

This chapter discussed how the Verilog constructs could be used for the 
product simulation regression purposes. The different Verilog constructs that 
influence the simulation during invocation and runtime have been illustrated. 
The chapter also discussed how the simulation session can influence the log 
and the dump file generation. 
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